Window Functions
Window functions perform calculations across a set of rows related to the current row, without collapsing rows like GROUP BY.
Syntax
function_name(args) OVER (
[PARTITION BY column1, column2, ...]
[ORDER BY column3 [ASC|DESC], ...]
)Window Function Categories
| Category | Functions | Description |
|---|---|---|
| Ranking | ROW_NUMBER, RANK, DENSE_RANK | Assign ranks to rows |
| Aggregate | SUM, AVG, COUNT, MIN, MAX | Running calculations |
| Navigation | LAG, LEAD, FIRST_VALUE, LAST_VALUE | Access other rows |
Ranking Functions
ROW_NUMBER()
Assigns a unique sequential number to each row.
SELECT
name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num
FROM students.csvResult:
name,score,row_num
Alice,95,1
Bob,90,2
Carol,90,3
David,85,4Each row gets a unique number, even with ties.
RANK()
Assigns rank with gaps for ties.
SELECT
name,
score,
RANK() OVER (ORDER BY score DESC) AS rank
FROM students.csvResult:
name,score,rank
Alice,95,1
Bob,90,2
Carol,90,2
David,85,4Bob and Carol tie at rank 2; David is rank 4 (rank 3 is skipped).
DENSE_RANK()
Assigns rank without gaps for ties.
SELECT
name,
score,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
FROM students.csvResult:
name,score,dense_rank
Alice,95,1
Bob,90,2
Carol,90,2
David,85,3Bob and Carol tie at rank 2; David is rank 3 (no gap).
Comparison
SELECT
name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num,
RANK() OVER (ORDER BY score DESC) AS rank,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
FROM students.csv| name | score | row_num | rank | dense_rank |
|---|---|---|---|---|
| Alice | 95 | 1 | 1 | 1 |
| Bob | 90 | 2 | 2 | 2 |
| Carol | 90 | 3 | 2 | 2 |
| David | 85 | 4 | 4 | 3 |
PARTITION BY
Divide rows into groups for independent calculations.
Ranking Within Groups
SELECT
department,
name,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees.csvResult:
department,name,salary,dept_rank
Engineering,Alice,120000,1
Engineering,Bob,100000,2
Marketing,Carol,90000,1
Marketing,David,85000,2Each department has its own ranking.
Multiple Partitions
SELECT
region,
department,
name,
sales,
ROW_NUMBER() OVER (
PARTITION BY region, department
ORDER BY sales DESC
) AS rank_in_dept
FROM sales_reps.csvAggregate Window Functions
Apply aggregate functions without collapsing rows.
Running Total
SELECT
date,
amount,
SUM(amount) OVER (ORDER BY date) AS running_total
FROM transactions.csvResult:
date,amount,running_total
2024-01-01,100,100
2024-01-02,150,250
2024-01-03,75,325Partition Total
SELECT
department,
name,
salary,
SUM(salary) OVER (PARTITION BY department) AS dept_total
FROM employees.csvEach row shows its department's total salary.
Running Average
SELECT
date,
temperature,
AVG(temperature) OVER (ORDER BY date) AS running_avg
FROM weather.csvCount Within Partition
SELECT
department,
name,
COUNT(*) OVER (PARTITION BY department) AS dept_size
FROM employees.csvMIN/MAX Within Partition
SELECT
department,
name,
salary,
MIN(salary) OVER (PARTITION BY department) AS dept_min,
MAX(salary) OVER (PARTITION BY department) AS dept_max
FROM employees.csvNavigation Functions
Access values from other rows.
LAG()
Access previous row's value.
LAG(column, offset, default) OVER (...)column: Column to retrieveoffset: Number of rows back (default: 1)default: Value if no previous row exists
SELECT
date,
price,
LAG(price, 1, 0) OVER (ORDER BY date) AS prev_price,
price - LAG(price, 1, price) OVER (ORDER BY date) AS price_change
FROM stock_prices.csvResult:
date,price,prev_price,price_change
2024-01-01,100,0,100
2024-01-02,105,100,5
2024-01-03,103,105,-2LEAD()
Access next row's value.
LEAD(column, offset, default) OVER (...)SELECT
date,
price,
LEAD(price, 1, 0) OVER (ORDER BY date) AS next_price
FROM stock_prices.csvResult:
date,price,next_price
2024-01-01,100,105
2024-01-02,105,103
2024-01-03,103,0FIRST_VALUE()
Get first value in the partition/window.
SELECT
name,
salary,
FIRST_VALUE(name) OVER (ORDER BY salary DESC) AS highest_earner
FROM employees.csvLAST_VALUE()
Get last value in the partition/window.
SELECT
name,
salary,
LAST_VALUE(name) OVER (ORDER BY salary DESC) AS lowest_earner
FROM employees.csvCommon Patterns
Top N Per Group
SELECT * FROM (
SELECT
department,
name,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees.csv
) ranked
WHERE rn <= 3Percent of Total
SELECT
department,
name,
salary,
ROUND(100.0 * salary / SUM(salary) OVER (), 2) AS pct_of_total,
ROUND(100.0 * salary / SUM(salary) OVER (PARTITION BY department), 2) AS pct_of_dept
FROM employees.csvMoving Average
-- 7-day moving average
SELECT
date,
value,
AVG(value) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS ma_7
FROM metrics.csvYear-over-Year Comparison
SELECT
month,
year,
sales,
LAG(sales, 12) OVER (ORDER BY year, month) AS sales_last_year,
sales - LAG(sales, 12) OVER (ORDER BY year, month) AS yoy_change
FROM monthly_sales.csvCumulative Distribution
SELECT
name,
score,
ROUND(100.0 * ROW_NUMBER() OVER (ORDER BY score) / COUNT(*) OVER (), 2) AS percentile
FROM students.csvGap Detection
SELECT
id,
value,
id - LAG(id, 1, id) OVER (ORDER BY id) AS gap
FROM sequence.csv
WHERE id - LAG(id, 1, id) OVER (ORDER BY id) > 1Running Difference
SELECT
date,
balance,
balance - LAG(balance, 1, balance) OVER (ORDER BY date) AS daily_change
FROM account.csvWindow Function vs GROUP BY
| Feature | Window Function | GROUP BY |
|---|---|---|
| Rows returned | All rows preserved | One row per group |
| Access to detail | Yes | Only aggregated values |
| Multiple aggregations | Same row can show multiple | Requires separate queries |
GROUP BY (collapses rows):
SELECT department, AVG(salary) FROM employees.csv GROUP BY departmentWindow Function (preserves rows):
SELECT name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees.csvPerformance Considerations
1. Index-Friendly ORDER BY
Order by indexed columns when possible:
ROW_NUMBER() OVER (ORDER BY id) -- Faster if id is indexed2. Minimize Partitions
Fewer partitions = faster computation:
-- More efficient (fewer partitions)
OVER (PARTITION BY department ORDER BY date)
-- Less efficient (many partitions)
OVER (PARTITION BY department, team, project ORDER BY date)3. Use LIMIT with Subqueries
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY score DESC) AS rn
FROM large_table.csv
) WHERE rn <= 10Related Documentation
- SQL Overview - General SQL reference
- Aggregation - GROUP BY and aggregates
- Aggregate Functions - Function reference