Indexes
Imagine looking for a specific chapter in a 1,000-page textbook. If the book doesn’t have a Table of Contents, you literally have to flip through every single page from the beginning until you find it. In database terminology, this is called a Full Table Scan. It is incredibly slow.
To fix this, we use Indexes.
What is an Index?
An Index is a separate data structure (usually a B-Tree) created and maintained by the database. It stores the values of a specific column in a highly sorted, searchable order, along with a pointer to the original row.
With an index on the email column, finding a user by their email takes milliseconds, even if the table has millions of rows.
Creating an Index
When you define a column as PRIMARY KEY or UNIQUE, the database engine automatically creates an index for it under the hood.
If you want to create an index on a normal column that you frequently query (like created_at or category), you must manually create it:
CREATE INDEX idx_product_category
ON products (category);
(The idx_product_category is just a naming convention name for the index).
The Cost of Indexes
If indexes make queries so fast, why don’t we index every single column?
Because indexes are not free:
- Disk Space: The B-Tree data structure takes up physical space on your hard drive.
- Write Performance Penalty: Every time you
INSERT,UPDATE, orDELETEa row in the table, the database has to do extra work to update the B-Tree index as well. If you have 10 indexes on a table, a singleINSERTrequires updating 10 B-Trees.
[!CAUTION] A common backend engineering mistake is “Over-indexing”. Only add indexes to columns that are frequently used in
WHERE,JOIN, orORDER BYclauses. Do not index columns that are only ever returned in theSELECTstatement.
Try writing the DDL to create an index in the interactive lab!