Massively improving database INSERTs
JavaScript
PHP
Research
Development
I use MySQL a lot. I run it locally using MAMP for a ton of data projects that never find their way online.
My current development project involves a one-off operation to insert 1,000,000 rows of data to a small table.
The table has 2 indexes - a Primary Key and a specific index on 2 columns.
The project is driven by JavaScript that then AJAX's out calls to PHP which runs the database operation.
The actual project is a pixel scanner which uses HTML5's Canvas API to examine a pixel, determine its RGBA values and then based on some conditions writes a specific value to the database.
The client side of things is remarkably quick and a 1,000 x 1,000 pixel image is scanned in around 30 seconds.
The server side of things is very very slow.
Here's how I was doing it initially.
- Determine the values from the pixel
- Fire off an AJAX call with jQuery
- Create a DB connection
- Perform INSERT in SQL
I shudder to think of the overheads being placed on MAMP for this.
The whole task equated to 1,000,000 INSERT operations and was off-the-scale slow and unreliable. In most cases the browser just hung and the system (an iMac) gave up.
Then I remembered the syntax used by PHPMyAdmin involved chaining the values to an INSERT statement much like this: INSERT INTO table (columnA,columnB) VALUES (val1,val2),(val3,val4)...
So in the JavaScript I set a threshold to 50 rows such that as the code looped through its pixels I bunched up INSERTS to contain 50 bracketed values per INSERT statement.
So this:
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
became
INSERT INTO table (columnA,columnB) VALUES (x,y),(x,y),(x,y),(x,y),(x,y),(x,y),(x,y),(x,y);
The result was staggering and I am now writing 1,000,000 rows of data in around 2 minutes.
That's still 20,000 database connections being established but the performance boost is fantastic.
I continue to develop more of an understanding for DBMS and optimisation.
Note: in an attempt to reduce the number of database connections to just one I tried to use the $_SESSION collection to store the connection.
I ran into problems with serialisation when attempting to re-assign the $_SESSION value to the $conn parameter which I use in my sql.