Indexes

Imagine looking for a specific chapter in a 1,000-page textbook. If the book doesn’t have a Table of Contents, you literally have to flip through every single page from the beginning until you find it. In database terminology, this is called a Full Table Scan. It is incredibly slow.

To fix this, we use Indexes.

What is an Index?

An Index is a separate data structure (usually a B-Tree) created and maintained by the database. It stores the values of a specific column in a highly sorted, searchable order, along with a pointer to the original row.

With an index on the email column, finding a user by their email takes milliseconds, even if the table has millions of rows.

Creating an Index

When you define a column as PRIMARY KEY or UNIQUE, the database engine automatically creates an index for it under the hood.

If you want to create an index on a normal column that you frequently query (like created_at or category), you must manually create it:

CREATE INDEX idx_product_category 
ON products (category);

(The idx_product_category is just a naming convention name for the index).

The Cost of Indexes

If indexes make queries so fast, why don’t we index every single column?

Because indexes are not free:

Disk Space: The B-Tree data structure takes up physical space on your hard drive.
Write Performance Penalty: Every time you INSERT, UPDATE, or DELETE a row in the table, the database has to do extra work to update the B-Tree index as well. If you have 10 indexes on a table, a single INSERT requires updating 10 B-Trees.

[!CAUTION] A common backend engineering mistake is “Over-indexing”. Only add indexes to columns that are frequently used in WHERE, JOIN, or ORDER BY clauses. Do not index columns that are only ever returned in the SELECT statement.

Try writing the DDL to create an index in the interactive lab!