GROUP BY and HAVING

In the last lesson, we learned that mixing normal columns with aggregate functions causes an error. But what if we want to see the total revenue per product, or the number of users per country?

We achieve this by segmenting our data using the GROUP BY clause.

How GROUP BY Works

GROUP BY tells the database to group rows together that share the same values in specified columns, and then calculate the aggregate function for each group independently.

Let’s say we want to know how many units of each product we have sold. We can group our order_items by the product_id:

SELECT product_id, SUM(quantity) AS total_units 
FROM order_items 
GROUP BY product_id;

This query outputs a list of unique product IDs, and next to each one, the total sum of the quantities sold specifically for that product.

[!IMPORTANT] The golden rule of GROUP BY: If you use an aggregate function (like SUM) in your SELECT statement, every other column in your SELECT statement must be listed in the GROUP BY clause.

Filtering Groups with HAVING

What if we only want to see products that have sold more than 100 units? Your first instinct might be to use the WHERE clause:

❌ ... WHERE SUM(quantity) > 100

This will fail. Why? Because WHERE filters individual rows before they are grouped and aggregated. The database doesn’t know the sum yet!

To filter based on the result of an aggregate function, you must use the HAVING clause. HAVING acts exactly like WHERE, but it runs after the grouping occurs.

SELECT product_id, SUM(quantity) AS total_units 
FROM order_items 
GROUP BY product_id
HAVING SUM(quantity) > 100;

The Full Execution Order

It’s critical to understand the order in which SQL executes your instructions:

FROM (Get the tables)
WHERE (Filter individual rows)
GROUP BY (Create buckets)
HAVING (Filter the buckets)
SELECT (Return the columns/aggregates)
ORDER BY (Sort the final output)

Try using both GROUP BY and HAVING in the interactive lab!