Saturday, 7 September 2013

MySQL load data from file

From time to time you will need to load data from flat files into MySQL database. Most likely those files will be in csv or similar format where each line in file represents one row in target table. You can simply write an app to read and parse file and then  insert data into desired table, or you can use MySQL's LOAD DATA INFILE method.

If you choose to do the later, here is full documentation on how it works. However, one a bit "more advanced" example may be of more use and quicker to read, understand and implement, especially if you have never been using it before.  So, here is one example that I have created for myself just to remind me how to do some things.  Hopefully it will help you as well to get up to speed more quickly.


LOAD DATA
  INFILE '/var/lib/mysql/web_machine1.csv'
  INTO TABLE database1.web_machines
  FIELDS
    TERMINATED BY '\;'
    ENCLOSED BY ''
    ESCAPED BY '\\'
  LINES
     TERMINATED BY '\r\n'
  (
    @var_request_time,
    @var_request,
    @var_response_bytes,
    @var_response_s,
    @var_response_us
  )
  SET
    web_machine = 'web01',
    request_time = STR_TO_DATE(@var_request_time,'%d/%m/%Y:%T'),
    request = @var_request,
    response_s = @var_response_s,
    response_us = @var_response_us,
    response_bytes = @var_response_bytes
;

So, just few notes explaining example above. In example above I needed to analyze performance of several web servers and to make some comparisons. Therefore, I've first set up performance logging on each web server machine. After given period of time, i needed to import those log files into database in order to perform all comparisons and performance calculations. Process of data import from csv log files into MySQL database was done using command code similar to the one showed above.

Let's dissect and analyze what each part of this code and see what it means.

Obviously, first I need to declare file from which I need to load data from. I always use absolute file path:

INFILE '/var/lib/mysql/web_machine1.csv'

Secondly, I need to declare in which database and which table I want to store data in:

INTO TABLE database1.web_machines

After that, I need to "tell" MySQL how is my csv file formatted. Therefore, I have to describe how individual data fields are separated, enclosed and escaped as well as how lines are separated:

FIELDS
  TERMINATED BY '\;'
  ENCLOSED BY ''
  ESCAPED BY '\\'
LINES
  TERMINATED BY '\r\n'

Then, as you can see, I prefer to read all data in variables first, rather than piping data straight into table. Since my csv files contained 5 data fields in each row, I declared 5 local variables - one for each data field. I'm always doing this because it gives me better overview and control.

(
  @var_request_time,
  @var_request,
  @var_response_bytes,
  @var_response_s,
  @var_response_us
)

Finally, I need to store data into database.

SET
  web_machine = 'web01',
  request_time = STR_TO_DATE(@var_request_time,'%d/%m/%Y:%T'),
  request = @var_request,
  response_s = @var_response_s,
  response_us = @var_response_us,
  response_bytes = @var_response_bytes

Here you can see why I prefer using local variables. Firstly, table I wanted to insert data in had one extra data field named "web_machine". As this field was not present in csv file, I would not have been able to load it into database correctly by simply piping data. Secondly, using variables enables me to process data before inserting it into database - as you can see in case of field "request_time". And, finally, it is easier to adapt code if order of fields in csv file changes. In this case it would only require me to re-order variables while all code after SET keyword would remain exactly the same.

1 comment:

Unknown said...

Since this procedure does not implicitly support loading of the fixed width data files, in case you need to load those, here is simple but effective workaround.

Simply load whole data row from file in single variable and then, using SUBSTR() function, pick up data and store it in appropriate fields. Here is simple generic example:


LOAD DATA LOCAL
INFILE '/path/to/file/data_file.csv'
INTO TABLE table1
FIELDS
ENCLOSED BY ''
ESCAPED BY '\\'
(
@row
)
SET field1 = TRIM(SUBSTR(@row,1,3)),
field2 = TRIM(SUBSTR(@row,5,5)),
field3 = TRIM(SUBSTR(@row,11,9)),
field4 = TRIM(SUBSTR(@row,21,5))
;