EXPLAIN and Query Performance
When you write a SQL query, you are writing declarative code. You tell the database what data you want, but you have no control over how the database goes into the hard drive to find it.
The database contains a highly complex piece of software called the Query Optimizer. The optimizer analyzes your query, looks at the available indexes, estimates the table sizes, and generates an Execution Plan.
If your query is running slowly, the first step in debugging is to look at the execution plan using the EXPLAIN keyword.
The EXPLAIN Keyword
To see the execution plan, you simply write EXPLAIN at the very front of your query.
EXPLAIN SELECT * FROM users WHERE email = 'test@test.com';
Instead of returning the actual data, the database will return a cryptic-looking block of text describing the steps it plans to take.
Interpreting the Output
While the exact output format depends on the database engine, you are generally looking for two things:
-
Scans vs Seeks:
- A Table Scan (or Seq Scan) means the database has to read every single row from the disk to find your data. This is bad.
- An Index Scan (or Index Seek) means the database successfully used an index to jump straight to the data. This is good.
-
Cost: The optimizer assigns an estimated “cost” to each step. Higher numbers mean the step requires more memory and CPU.
[!TIP] In PostgreSQL, you can use
EXPLAIN ANALYZE SELECT .... WhileEXPLAINonly shows the planned execution,EXPLAIN ANALYZEactually executes the query and shows you the actual time it took in milliseconds.
If you ever build a backend API and notice that an endpoint is taking 5 seconds to load, take the SQL query generated by the ORM and run it through EXPLAIN. 9 times out of 10, you’ll spot a massive Table Scan that can be fixed by adding an Index!
Try running EXPLAIN in the lab to peek behind the curtain!