Correlated Subqueries
So far, the subqueries we’ve written have been entirely independent. If you highlighted the inner query and ran it by itself, it would work perfectly.
A Correlated Subquery is different. It is a subquery that references a column from the outer query. This means the inner query cannot run independently; it relies on the outer query for context.
How Correlated Subqueries Work
Because it references the outer query, a correlated subquery is executed repeatedly—once for every single row evaluated by the outer query.
Imagine you want to find every employee who earns more than the average salary of their specific department.
You can’t just find the global average salary. You need the database to look at Alice in Engineering, calculate the Engineering average, and compare her salary. Then look at Diana in Sales, calculate the Sales average, and compare her salary.
SELECT name, department, salary
FROM employees AS outer_emp
WHERE salary > (
SELECT AVG(salary)
FROM employees AS inner_emp
WHERE inner_emp.department = outer_emp.department
);
The Execution Process
- The outer query starts with the first row: Alice (Engineering, $120,000).
- It passes her department (‘Engineering’) into the inner query.
- The inner query calculates the average salary where department = ‘Engineering’.
- It compares Alice’s $120,000 to that specific average. If she is higher, she is included in the final results.
- It repeats this process for Bob, then Charlie, etc.
[!WARNING] Because a correlated subquery executes once for every row in the outer table, it can be incredibly slow on large datasets. If you have 1 million employees, that inner query runs 1 million times! In production, developers often rewrite correlated subqueries using
JOINs or Window Functions to improve performance.
Despite the performance concerns, they are logically powerful and an essential concept to grasp. Try writing one in the lab to find our highest earners per department!