GROUP BY and HAVING
In the last lesson, we learned that mixing normal columns with aggregate functions causes an error. But what if we want to see the total revenue per product, or the number of users per country?
We achieve this by segmenting our data using the GROUP BY clause.
How GROUP BY Works
GROUP BY tells the database to group rows together that share the same values in specified columns, and then calculate the aggregate function for each group independently.
Let’s say we want to know how many units of each product we have sold. We can group our order_items by the product_id:
SELECT product_id, SUM(quantity) AS total_units
FROM order_items
GROUP BY product_id;
This query outputs a list of unique product IDs, and next to each one, the total sum of the quantities sold specifically for that product.
[!IMPORTANT] The golden rule of
GROUP BY: If you use an aggregate function (likeSUM) in yourSELECTstatement, every other column in yourSELECTstatement must be listed in theGROUP BYclause.
Filtering Groups with HAVING
What if we only want to see products that have sold more than 100 units? Your first instinct might be to use the WHERE clause:
❌ ... WHERE SUM(quantity) > 100
This will fail. Why? Because WHERE filters individual rows before they are grouped and aggregated. The database doesn’t know the sum yet!
To filter based on the result of an aggregate function, you must use the HAVING clause. HAVING acts exactly like WHERE, but it runs after the grouping occurs.
SELECT product_id, SUM(quantity) AS total_units
FROM order_items
GROUP BY product_id
HAVING SUM(quantity) > 100;
The Full Execution Order
It’s critical to understand the order in which SQL executes your instructions:
FROM(Get the tables)WHERE(Filter individual rows)GROUP BY(Create buckets)HAVING(Filter the buckets)SELECT(Return the columns/aggregates)ORDER BY(Sort the final output)
Try using both GROUP BY and HAVING in the interactive lab!