Uploading bulk data to database

posted in: MySql | 0

Summary:

In many of our projects we need to do import/batch upload of data from a csv or xls file. We usually do data import by reading csv file and inserting records line by line. In this way the no of sql queries running on the DB is minimum the no of records exists in a file.

This can be done in a smarter way. Loading the CSV file data into a temporary table and performing batch insert and update statements. In this case we have maximum 10-15 queries run on DB based on the logic that needs to be implemented.

I am taking a sample employee data to explain this process.

data.csv
empno|fname|lname|location
10001|Phillip|Johnson|Minnesota
10002|Madison|Hadler|Hearthstone

This CSV file has empno, fname, lname and location. The empno, fname and lname has to be inserted into users table.
Location column is employee location info. If new location comes in CSV file, It has to be inserted into locations master table.  We need to update user table with corresponding location_id.

Actual tables:
CREATE TABLE `users` (
`user_id` INT( 11 ) NOT NULL ,
`empno` VARCHAR( 50 ) NOT NULL ,
`fn` VARCHAR( 50 ) NOT NULL ,
`ln` VARCHAR( 50 ) NOT NULL ,
`location_id` INT( 11 ) NOT NULL
)

CREATE TABLE `locations` (
`location_id` INT( 11 ) NOT NULL ,
`location_name` VARCHAR( 100 ) NOT NULL ,
`status` TINYINT( 1 ) NOT NULL
)

Create a temporary table with below fields.

CREATE TABLE `tmp_data` (
`empno` VARCHAR( 20 ) NOT NULL ,
`fname` VARCHAR( 50 ) NOT NULL ,
`lname` VARCHAR( 50 ) NOT NULL ,
`location` VARCHAR( 50 ) NOT NULL ,
`userid` INT( 11 ) NOT NULL ,
`locationid` INT( 11 ) NOT NULL
)

locationid will hold the location id from the locations master table. userid column will have the userid from the users table. We first populate these fields from actual tables and perform the batch insert/update.

//load data to temp table from file
LOAD DATA LOCAL INFILE “E:\\data.csv” INTO TABLE tmp_data FIELDS TERMINATED BY “|” IGNORE 1 LINES (empno,fname,lname,location);

//Update location_id column in temp table. If any new location comes the a.locationid remains ‘0’
UPDATE tmp_data a, locations b SET a.locationid = b.location_id WHERE a.location = b.location_name

////Insert new locations into locations table where the location_id is 0.
INSERT  INTO locations( location_name, status ) ( SELECT distinct location, ‘1’ FROM tmp_data WHERE locationid = ‘0’ )

//Again update locationid column in temp table. With this step all rows have locationid filled.
UPDATE tmp_data a, locations b SET a.locationid = b.location_id WHERE a.location = b.location_name;

//Update userid column in temp table. for new users the a.userid remains ‘0’.
UPDATE tmp_data a, users b set a.userid = b.user_id WHERE a.empno = b.empno;

//Insert into users table where the userid is 0 – All New users will get inserted into users table.
INSERT  INTO users(fn, ln, empno, locationid ) ( SELECT empno, fname, lname, location_id FROM tmp_data WHERE userid = ‘0’ );

//for already existing users where userid>0 – Update users table with csv data
UPDATE tmp_data a, users b set b.fn=a.fname, b.ln=a.lname, b.location_id=a.locationid WHERE a.userid=b.user_id and a.userid>0

Previous Post
Next Post

Leave a Reply