Massively improving database INSERTs
I use MySQL a lot. I run it locally using MAMP for a ton of data projects that never find their way online.
My current development project involves a one-off operation to insert 1,000,000 rows of data to a small table.
The table has 2 indexes - a Primary Key and a specific index on 2 columns.
The project is driven by JavaScript that then AJAX's out calls to PHP which runs the database operation.
The actual project is a pixel scanner which uses HTML5's Canvas API to examine a pixel, determine its RGBA values and then based on some conditions writes a specific value to the database.
The client side of things is remarkably quick and a 1,000 x 1,000 pixel image is scanned in around 30 seconds.
The server side of things is very very slow.
Here's how I was doing it initially.
- Determine the values from the pixel
- Fire off an AJAX call with jQuery
- Create a DB connection
- Perform INSERT in SQL
I shudder to think of the overheads being placed on MAMP for this.
The whole task equated to 1,000,000 INSERT operations and was off-the-scale slow and unreliable. In most cases the browser just hung and the system (an iMac) gave up.
Then I remembered the syntax used by PHPMyAdmin involved chaining the values to an INSERT statement much like this: INSERT INTO table (columnA,columnB) VALUES (val1,val2),(val3,val4)...
So in the JavaScript I set a threshold to 50 rows such that as the code looped through its pixels I bunched up INSERTS to contain 50 bracketed values per INSERT statement.
So this:
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
became
INSERT INTO table (columnA,columnB) VALUES (x,y),(x,y),(x,y),(x,y),(x,y),(x,y),(x,y),(x,y);
The result was staggering and I am now writing 1,000,000 rows of data in around 2 minutes.
That's still 20,000 database connections being established but the performance boost is fantastic.
I continue to develop more of an understanding for DBMS and optimisation.
Note: in an attempt to reduce the number of database connections to just one I tried to use the $_SESSION collection to store the connection.
I ran into problems with serialisation when attempting to re-assign the $_SESSION value to the $conn parameter which I use in my sql.
Old links
- Generate SSH key pair from the command line
- Some very basic thoughts on Artificial General Intelligence
- Invaders 2095 - JavaScript game in development
- Turning coarse dirt into luscious green grass in Minecraft
- Intelligent dinosaurs?
- Thoughts on writing a DOOM movie script
- Why do we always see the same side of the moon?
- Are humans descended from apes?
- Directory traversal
- http referer not capturing the query string from search engines
- Reverse IP lookup
- JavaScript Promises
- Vintage Weight Watchers Update Log
- My illustrated children's books
- Filling my skills gap
- St. Wystan's church, Repton Derbyshire
- Updates to the Vintage Points Calculator
- Massively improving database INSERTs
- Vintage Points Calculator
- What is the difference between bcrypt and Argon2 in PHP's password_hash?
- Defender 2095 - JavaScript game in development