ROW_NUMBER() and RANK()
In the previous lesson, we used OVER() with a standard aggregate function (AVG). But SQL also provides special functions that only work as Window Functions.
The most commonly used are functions that assign a sequential number to rows: ROW_NUMBER(), RANK(), and DENSE_RANK().
ROW_NUMBER()
ROW_NUMBER() simply assigns a unique, sequential integer to each row within a partition.
To use these ranking functions, we must introduce the ORDER BY clause inside the OVER() window. The database needs to know what order to assign the numbers in!
SELECT
name,
department,
salary,
ROW_NUMBER() OVER(PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;
This queries orders employees within each department from highest paid to lowest paid, and numbers them 1, 2, 3, etc. If two people have the exact same salary, one will arbitrarily get ‘1’ and the other ‘2’.
RANK()
What if we want to handle ties fairly? If two people make the exact same highest salary, they should both be ranked #1. This is what RANK() does.
SELECT
name,
department,
salary,
RANK() OVER(PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM employees;
If Diana and Greg both make $85,000 in Sales, and that’s the highest salary, RANK() will give them both a rank of 1.
Important: The next person in line will get a rank of 3! RANK() skips numbers if there are ties. (1, 1, 3, 4…).
DENSE_RANK()
If you don’t want numbers to be skipped after a tie, you use DENSE_RANK().
If Diana and Greg tie for 1st place, they both get 1. The next person in line will get 2. (1, 1, 2, 3…).
[!TIP] A classic interview question is “Find the 2nd highest salary in every department.” You solve this by putting a
RANK()window function inside a CTE, and then querying the CTE withWHERE rank = 2.
Try using RANK() in the lab to find out who the top earners are in our company departments!