Gorilla3D Primative Teachings
Faster & Smarter MySQL Random Row Selection
Wednesday, December 19, 2007
In this entry will be dealing the following dataset:
geoip.sql ~ 94k rows | 7.8 MB
Figure 1. Shows the typical random MySQL random row Selection. On my server I ran 1000 iterations coming to about 0.6 seconds average in response time. The script took over 10 mins to execute 1000 random rows. Thats is awful performance and mean that if your site has more then 1 queries per second your database would crawl to a halt.
Figure 1. ~ 0.6 seconds
<?php //-- Mysql Database Config $db_host = 'localhost'; $db_db = 'mysite'; $db_user = 'root'; $db_pass = ''; //-- Mysql Connection //-- Fetch a random row $sql = "SELECT * FROM `csv` ORDER BY RAND() LIMIT 1"; ?>
Figure 2. Shows an alternative method yields 1000 iterations coming to about 0.001 seconds average in response time. Meaning your database can now withstand 1000 queries per second.
Figure 2. ~ 0.001 seconds
<?php //-- Mysql Database Config $db_host = 'localhost'; $db_db = 'mysite'; $db_user = 'root'; $db_pass = ''; //-- Mysql Connection //-- Find the largest ID $sql = "SELECT * FROM `csv` ORDER BY `id` DESC LIMIT 1"; //-- Fetch a random row ?>
Now what if your database has large section of missing ids (They skip in range, i.e 1, 3, 7)? Well look at Figure 3. This is an example of just looping the random selection based on the same code as Figure 2. The performance on this will very as it could be the 1st result that yields a row or the 100th, none of the less it still a far better alternative result then using MySQL's built in RAND() function to fetch random rows.
Figure 3. ~ N/A
<?php // Mysql Database Config $db_host = 'localhost'; $db_db = 'mysite'; $db_user = 'root'; $db_pass = ''; // Mysql Connection //-- Find the largest ID $sql = "SELECT * FROM `csv` ORDER BY `id` DESC LIMIT 1"; $result = false; //-- Query a random row until we get a result while(!$result) { } ?>
Now if your looking for a pure Mysql method use a subquery, this will allow you to use Mysql to still generate random ids. If you look at Figure 4. the subquery is which just totals the number of rows within our database. Then the table's id is matched against php's random function lcg_value() which only generates 0 - 1 numbers, to fix that we just multiply that against our subquery's 'total'. So in the end what did 1000 iterations prove? Well I got 0.0005 seconds average per query, as you can see even faster then in Figure 2.
Firgure 4. ~ 0.0005 seconds by Joe
<?php // Mysql Database Config $db_host = 'localhost'; $db_db = 'mysite'; $db_user = 'root'; $db_pass = ''; // Mysql Connection ?>
So when do you use RAND()?
For a single selection RAND() is just slow but when you need to use WHERE to filter your random selections as well as selection MANY random rows, RAND is your best best. Figure 5. & Figure 6. prove that point RAND() is just what to use when you need to filter your results. Now if you have a table that does not change very often and you are using the same filter over and over, Figure 6. would be your best best, this allows you to reuse the temp table to query rand rows much faster.
Figure 5. - Rand() ~ 0.65 seconds
//-- Mysql Database Config $db_host = 'localhost'; $db_db = 'mysite'; $db_user = 'root'; $db_pass = ''; //-- Mysql Connection //-- FINALLY query for our random rows w00t $sql = "SELECT * FROM csv WHERE cn = 'Germany' ORDER BY RAND() LIMIT 50"; $row; }
Figure 6. - Temp Table ~ 1.1 seconds / ~ 0.01 seconds
//-- Mysql Database Config $db_host = 'localhost'; $db_db = 'mysite'; $db_user = 'root'; $db_pass = ''; //-- Mysql Connection //-- Create a temp table, with our WHERE filters $sql = "CREATE TEMPORARY TABLE IF NOT EXISTS csvTemp ( temp_id INT(10) unsigned NOT NULL auto_increment, PRIMARY KEY (temp_id) ) SELECT * FROM csv WHERE cn = 'Germany'"; //-- Fetch the total row count (we cant join/subquery the same temp table) $sql = "SELECT COUNT(*) AS total FROM csvTemp"; //-- Generate our random temp_ids } //-- FINALLY query for our random rows w00t $row; }
If you know better methods, please by all means leave a comment and share your code. You can use syntax highlighting as above by encasing your code in BBcode style tags. i.e
Volomike posted 93 days ago
My problem is seen with a paid sponsored classified listings site. I need 5 random sponsored listings that match a given keyword, followed by the rest of the non-sponsored listings. I need to make good use of randomness because I don't need some sponsored listings to have precedence too much over others. You can see how slow things are by checking out http://118fm.co.uk/. I have a real conundrum because keyword search queries are taking like 8 seconds to run, even with indexes added in. It used to take like 2 seconds when I first built the database.
Emmy posted 114 days ago
hello babe! remember me? :)
Joseph Montanez posted 296 days ago
Rob,
You are right when it comes to RAND() and you need to select many random rows as well as set up WHERE filters, RAND() is just what you need.
Rob Marsh posted 296 days ago
What about the case where a) you want more than a single row and b) there is a WHERE clause filtering the rows so that they may be be sparse? Nothing I've tried is faster than RAND().
Joseph Montanez posted 296 days ago
Given the query below it results in 0.0005 vs 0.001 in Figure 1. I acturally did not know how much RAND() degrades in performance, I also replaced RAND in Figure 4. to get 0.0006 results, yours is clearly better.