Tom Cannaerts

Optimizing queries with MySQL EXPLAIN

The MySQL explain command is one of the most useful commands when debugging or optimizing queries on MySQL. It will tell you how MySQL is processing that data it needs to run your query. To use the command, simply put EXPLAIN (or EXPLAIN EXTENDED) in front of your query. For each table (or derived table), the command will output a line containing information about how MySQL dealt with the data in that table. It will tell you what indexes are used, how data is joined and what extra ‘conditions’ were in place when processing that table (eg. sorting).

Take a look at the following example statement.

EXPLAIN EXTENDED SELECT * FROM `words` WHERE `wordid`=’1234′;

+—-+————-+——-+——-+—————+———+———+——-+——+———-+——-+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+—-+————-+——-+——-+—————+———+———+——-+——+———-+——-+
|  1 | SIMPLE      | words | const | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 |       |
+—-+————-+——-+——-+—————+———+———+——-+——+———-+——-+

As expected, the result is a single row as we are only selecting data from one table. The different columns will give us important information on how this query is executed. The example above isn’t really all that exciting. In short, it tells us that we are dealing with a single table that is filtered on the Primary Key (PK) and results in 1 row.

The select_type column

The select_type column will tell us what kind of table we are dealing with. In this particular example, the select_type column show us that we are dealing with a SIMPLE select. This means that we are executing a single query or performing a join on a single table. When using subqueries, you will see PRIMARY for the outer query and SUBQUERY for the inner query. If you are using UNION to combine tables, the first select of the union will have select_type PRIMARY, and subsequent queries will have the type UNION. The resulting table will also be included in the output as the UNION RESULT. When UNION of SUBQUERY is preceded with the keyword DEPENDENT, it means that the subquery or union is dependent of the outer query and has to be evaluated again for every row of the outer query. Finally, the DERIVED select_type refers to a derived table, which is a subquery used in FROM clause.

The table colum

The table column refers to the table being queried. If you are using a table alias, the alias will be shown rather than the actual table name. There are also a number of special values, like unionM,N or derivedN when refering to the result of a union or derived table. Another special value is null. If you are selecting data from a table but that table column shows null, it means that MySQL did not even bothered to access the table as it deemed the query impossible to return anything. An example of this could be a filter on a negative value on an UNSIGNED INT column. In this case, the extra colum will contain a message stating that a an impossible WHERE was noticed.

The type column

The type column is probably the most important column. It will tell us how the data of that table will be accessed. We’ll be going over them in best-to-worst order performance wise.

null

When type is null, there is no data to be fetched from the table. This occurs in cases where there simply is no table (see the null table above). So either you’re not selecting data from the table (eg. SELECT 1 + 1) or you’re doing something very wrong. If you’re not doing anything fancy, you should not encounter this one.

const

The const type is the fastest type commonly used. This is the case when you are lookup up data by the primary key (PK) or another not nullable unique key (NNUK). Since there will be only a single match, the server can stop after it finds this match. If you have a const type query, you are done optimizing. Move on to the next part of your query 😉 There is one special case of const, the system type. This is only applicable to MyISAM tables (like the MySQL system tables, hence the name) that have only one record. Since MyISAM stores the exact number of rows in the metadata, it does not need to use any index if the table only has 1 row.

eq_ref

When doing joins, you will probably want to aim for the eq_ref type. This means that you are joining one table on the PK or other NNUK of another table. This would typically describe any one-to-many kind of relationship, assuming that the one-part of the relation is the PK. Take a typical webshop that has a customers table and an orders table that are joined on customer id (which is the PK of the customers table). When joining them together, the customers table will be of the type eq_ref, as you are joining on the PK. In some circumstances, it’s possible that this table is also of the const type rather than eq_ref if the other side of the join is of the const type as well. If looking at the webshop example, this would be the case if we were looking up the customer that goes with one specific order using the order if (PK) of that order. Since the orders table will only return 1 value and the customers table is joined on its PK, both will be treated as const as the server known there will be one row at best that will come out of that join.

ref

The ref type is used when you are joining a table on an index that is non-unique (in other words: there might be more than one record that matches the filter). This could simple be a non-unique index, but might as well be the left-most part of a unique multi-column index. Because a multi-column index is ordered starting from the left column to the right, MySQL can use part of this index as long as that part contains all the left-most columns.

fulltext

The fulltext type is quite self-explainatory, at least if you know what a full-text index is. Since a full-text index requires the use of special clauses in the query, you probably don’t need the explain command to tell you that you are using it.

ref_or_null

The ref_or_null type is the same as the ref type, with the exception that is allows null values. Since null doesn’t match anything, MySQL needs an additional pass to specifically check for null values. Obviously, this extra pass comes at a performance penalty. If in reality you are not using the null values at all (a lot of people tend to replace null with an empty string or zero), you should probably change your database schema to reflect that as it will turn your query into a ref type and loose the additional pass.

index_merge

An index_merge can occur when you are filtering on 2 seperate indexes on the same table. Suppose you have a table representing people and you have an index on lastname and a seperate index on firstname. When using both lastname and firstname, MySQL may or may not decide to merge the two indexes together to form the result. For more in-depth information on index_merge, see the Index Merge optimization chapter of the MySQL documentation. The index_merge does teach us something usefull. Since MySQL is merging the two indexes on the fly, it would probably be better if we were to create a multi-column index on the table ourselves. That will allow MySQL to use that index directy, instead of having to create it.

unique_subquery, index_subquery

If you are using a subquery in an IN clause, and the subquery will only return unique values (eg. when returning a single colum PK or UK), MySQL will replace the subquery with an index lookup function for better performance. The index_subquery type is similar to unique_subquery, with the difference that is used when returning non-unique values from an index. Note that althoug not always the case, subqueries can often be rewritten as joins that are able to make use of indexes more efficiently (definitely when dealing with dependant subqueries where the subquery needs to be run for every row in the outer table).

range

The range type is a partial index scan (see index type below). Since binary trees (indexes in MySQL are binary trees) are ideal for returning data in sequential order, they can be easely sliced from a starting value to an ending value. This is typically used in queries that use < or > or variations of that.

index

The index type sounds better than it actually is. What is does, is perform a full index scan rather than actually use the index to retrieve the rows. This is very similar to the all type, with the difference that it doesn’t need to go to the actual table to select the data, as all data requested is already available in the index.

all

When you are using the all type, you are actually doing what is also known as a full table scan. If you are actually filtering, you are not using any indexes. This doesn’t mean that there are no indexes on the fields you are querying, it means that MySQL isn’t using them. A classic example of that is a field that can only contain a limited number of values, like a boolean yes/no or gender field. At that point, MySQL can decide that it’s probably better to do a full table scan rather than use the index, as the index will probably return a large portion of the rows. If you are using EXPLAIN EXTENDED, the filtered column can give you an indication of why MySQL is doing a full table scan.

The possible_keys and keys columns

The possible_keys column will contain a list of keys that might be used to filter the data, where the keys column will contain the actual key used. However, it is possible that MySQL decides to use an index that is not listed in the possible_keys column. This could be the case when you are filtering and/or returing data that is completly contained in another key. Take a query where you are filtering on the second column of a two-column index, but returning the first. Although MySQL can’t use the index to gather the rows (so it won’t be present in the possible_keys column), it will be more efficient to do a full index scan on that index rather than having to do a full table scan.

The rows and filtered columns

The rows column is the number of rows that MySQL believes it must examine to execute the query. This info is gathered from the metadata of the table, so for InnoDB this will be an approximate number. The filtered column (when using EXPLAIN EXTENDED) will give an indication of the percentage of of rows that will be filtered. If the number of filtered rows is too low, MySQL can decide not to use a specific index and rather do a index or table scan instead.

The “extra” column

The extra column contains additional information regarding the conditions applied on the table. This can be a range of different values that may or may not be usable when optimizing your queries. We’ll discuss the most important ones that apply to query performance below.

using where

The presence of using where in the extra field indicates that a where was used to limit the rows returned. This does not refer to the rows filtered by the index. You will see this when filtering on multple columns that are not part of the same index (or not usable in a multi-column index). MySQL will try to select the best index to do the initial filtering and will filter out the remaining rows by applying the other filter conditions to it. You will also see this when MySQL decides to use a full index or table scan rather than an index.

using temporary

This typically happens when using a GROUP BY or DISTINCT clause, but sorting the result by other columns. MySQL will have to create a temporary table to hold the result of the GROUP BY or DISTINCT and apply the ORDER BY on that table.

using filesort

This means that MySQL has to do an extra pass in order to retrieve the rows in sorted order. This will always be the case when sorting the data in a different way than the index that is used to filter it. In short, if you are filtering on one field and sorting by another field, you will see a using filesort in the extra column. Many people confuse filesort with a temporary table that is written to disk. Although other conditions might be present that result in the temporary table being written to disk, the filesort is not the cause. Filesort refers to the sorting algorythm that is used to sort the rows. For each row the matches the filters, MySQL will create a sort key and “file” that in an index. That index is then used to return the data in the correct order.

using index

This will not only state the obvious that we are actualy using an index (the “keys” column already told us that), but more imporantly it tells us that all data we need to retrieve is actually contained in the index. This means that MySQL does not need to read the data from the actual table.

Impossible WHERE, Impossible HAVING

MySQL has determined that the where or having clause will always evaluate to false. Since no rows can be returned, MySQL will not do any data processing for that table. Unless you are actually doing very specific things, this is probably a good indicator that you are doing something wrong.

full scan on NULL key

This means that you have a subquery lookup that is not using any indexes. If you are dealing with a dependent subquery that involves searching through a considderable amount of rows, you will probably want to try to convert that into a JOIN if possible.

One final note

So now that you know what your query is doing and hopefully optimized your queries, keep in mind that changes to the data can influence the performance of your query. If the data in a table changes significantly (eg. 30.000 record rather than the 3.000 records you started out with), MySQL might just choose a different index to run your query. Obviously, this can impact the performance of your query. If you hear someone say “but it runs fine on development”, you might just want to start with making sure that you are actually using the exact same dataset as production, and not a trimmed version with some bogus data. So even when you’re done optimizing, you may just want to periodically check your queries to see if they are indeed still fit for production.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.