Normalization
You know the syntax to create tables and link them. But how do you decide what tables to create in the first place? If you’re building an E-Commerce site, should you have 2 tables or 20 tables?
The process of designing a database structure to minimize redundancy and prevent data anomalies is called Database Normalization.
The Problem with Denormalization
Imagine a single, massive table called sales:
(order_id, date, customer_name, customer_email, product_name, product_price)
If Alice buys a Laptop today and a Mouse tomorrow, her customer_name and customer_email are stored twice.
- Redundancy: Wastes disk space.
- Update Anomaly: If Alice changes her email, we have to find and update every single row she’s ever associated with. If we miss one, our data is inconsistent.
The Normal Forms
Database designers follow a set of rules called “Normal Forms” (NF) to fix this.
1st Normal Form (1NF)
Every column should contain atomic (indivisible) values. No arrays or comma-separated lists in a single cell.
2nd Normal Form (2NF)
Eliminate partial dependencies. If a table has a composite primary key, every other column must depend on the entire key, not just part of it.
3rd Normal Form (3NF)
“Every non-key attribute must depend on the key, the whole key, and nothing but the key, so help me Codd.”
In our sales table example, customer_email depends on customer_name, not on the order_id. To reach 3NF, we split the data into multiple tables:
users(id, name, email)products(id, name, price)orders(id, user_id, date)
Many-to-Many Relationships
Because an order can contain many products, and a product can belong to many orders, we have a Many-to-Many relationship. We normalize this by creating a Join Table (or Intersection Table) sitting between them.
This table (e.g., order_items) usually doesn’t need its own id column. Instead, it uses a Composite Primary Key made from the two foreign keys it connects!
CREATE TABLE order_items (
order_id INTEGER,
product_id INTEGER,
quantity INTEGER,
-- The combination of these two is the unique Primary Key!
PRIMARY KEY (order_id, product_id)
);
Try creating a normalized Join Table with a Composite Primary Key in the lab!