OVER
Description
Window functions, also known as analytic functions, are a powerful feature in SQL that allows to perform calculations across a set of table rows that are somehow related to the current row. This is different from regular aggregate functions which return a single aggregate value for a set of rows. Window functions retain the individual rows while calculating aggregates, ranks, running totals, or other calculations. The PARTITION BY
clause is used within window functions to divide the result set into partitions to which the window function is applied.
Syntax
window_function(): This is a placeholder for any window function, such as
SUM()
,AVG()
,ROW_NUMBER()
,RANK()
, etc.OVER: Indicates the use of a window function.
[PARTITION BY partition_expression]: (Optional) Divides the result set into partitions to which the window function is applied. Each partition is processed separately.
[ORDER BY order_expression]: (Optional) Specifies the order of rows within each partition or within the entire result set if
PARTITION BY
is not used. This is necessary for functions that depend on the order, like cumulative sums or rankings.[RANGE BETWEEN start_expr AND end_expr]: (Optional) Defines a window frame, which is a set of rows relative to the current row.
RANGE
considers rows within a certain range of values, whileROWS
considers a specific number of rows. This can be especially useful for cumulative sums, moving averages, or other types of running totals.
Common Window Functions
Ranking Functions:
ROW_NUMBER()
: Assigns a unique, sequential number to each row within a partition or ordered result set. It's useful for assigning row positions or creating custom ordering.RANK()
: Assigns a rank to each row based on a specified ordering criteria (e.g., descending sales amount). Rows with the same value share the same rank, and subsequent ranks might skip numbers.DENSE_RANK()
: Similar toRANK()
but assigns consecutive ranks without gaps, even if multiple rows share the same value in the ordering criteria.
Aggregate Window Functions:
SUM(expression) OVER (...)
: Calculates the running or cumulative sum of an expression over a window. It's useful for tracking totals within partitions or ordered sets.AVG(expression) OVER (...)
: Calculates the running or cumulative average of an expression over a window. It's helpful for analyzing trends within groups of data.MIN(expression) OVER (...)
: Identifies the minimum value of an expression within a window. It can be used to find the lowest value within a specific range of rows.MAX(expression) OVER (...)
: Identifies the maximum value of an expression within a window. It can be used to find the highest value within a specific range of rows.
Other Window Functions:
FIRST_VALUE(expression) OVER (...)
: Retrieves the first value of an expression encountered within a window. It's useful for grabbing the initial value within a partition or ordered set.LAST_VALUE(expression) OVER (...)
: Retrieves the last value of an expression encountered within a window. It's helpful for grabbing the final value within a partition or ordered set.LEAD(expression, offset) OVER (...)
: Looks ahead a specified number of rows (offset
) and returns the value of the expression at that position. It's useful for comparing values with future positions within the window.LAG(expression, offset) OVER (...)
: Looks behind a specified number of rows (offset
) and returns the value of the expression at that position. It's useful for comparing values with past positions within the window.RATIO_TO_REPORT(expression)
: It is an analytic function used to calculate the ratio of a value to the sum of a set of values. It is commonly used to determine the proportion of a value within a group relative to the total for that group.
Examples
Calculating ROW_NUMBER() with PARTITION BY
We want to assign a unique row number to each sale within each product.
Table sales
:
sales
:1
101
1001
500
2024-01-01
2
102
1002
300
2024-01-01
3
101
1001
700
2024-01-02
4
103
1003
200
2024-01-02
5
102
1002
400
2024-01-03
Output:
1
101
500
1
3
101
700
2
2
102
300
1
5
102
400
2
4
103
200
1
Calculating SUM() with PARTITION BY
We want to calculate the total sales amount for each customer.
Table sales
:
sales
:1
101
1001
500
2024-01-01
2
102
1002
300
2024-01-01
3
101
1001
700
2024-01-02
4
103
1003
200
2024-01-02
5
102
1002
400
2024-01-03
Output:
1001
500
1200
1001
700
1200
1002
300
700
1002
400
700
1003
200
200
Example: Using RANGE BETWEEN
with SUM
RANGE BETWEEN
with SUM
Sample Input
We want to calculate the cumulative sales amount within the last 90 days for each row.
Example with ROWS BETWEEN
ROWS BETWEEN
For considering the last 3 rows instead of a time interval, we would use ROWS BETWEEN
Example using PARTITION BY
with RANGE
PARTITION BY
with RANGE
We want to calculate cumulative sales within each year and the cumulative sales should be within the last 90 days for each year.
Example using RATIO_TO_REPORT
Suppose we have a table sales
with the following structure:
1
1
100
1
2
150
2
1
200
2
2
250
We want to calculate the ratio of each sales_amount
to the total sales_amount
for each region_id
Output
1
1
100
0.3333
2
1
200
0.6667
1
2
150
0.3750
2
2
250
0.6250
Last updated
Was this helpful?