Window Functions
Use window functions to compute averages, sort, rank, or count items, calculate sums, and find minimum or maximum values across windows.
Important: To query using Windows functions, you must enable Pinot's multi-stage query engine (v2). See how to enable and use the multi-stage query engine (v2).
Window Functions overview
This is an overview of the window functions feature.
Window function syntax
Pinot's window function (windowedCall
) has the following syntax definition:
windowedCall
refers to the actual windowed operation.windowFunction
refers to the window function used, see supported window functions.window
is the window definition / windowing mechanism, see supported window mechanism.
You can jump to the examples section to see more concrete use cases of window functions in Pinot.
Example window function query layout
The following query shows the complete components of the window function. Note that the PARTITION BY
,ORDER BY
, and the FRAME
clauses are all optional.
Window mechanism (OVER clause)
Partition by clause
If a
PARTITION BY
clause is specified, the intermediate results will be grouped into different partitions based on the values of the columns appearing in thePARTITION BY
clause.If the
PARTITION BY
clause isn’t specified, the whole result will be regarded as one big partition, i.e. there is only one partition in the result set.
Order by clause
If an
ORDER BY
clause is specified, all the rows within the same partition will be sorted based on the values of the columns appearing in the windowORDER BY
clause. TheORDER BY
clause decides the order in which the rows within a partition are to be processed.If no
ORDER BY
clause is specified while aPARTITION BY
clause is specified, the order of the rows is undefined. To order the output, use a globalORDER BY
clause in the query.
Frame clause
RANGE
type window frames currently cannot be used with offset PRECEDING
/ offset FOLLOWING
The following window frame clauses are currently supported:
RANGE frame_start
whereframe_start
can beUNBOUNDED PRECEDING
orCURRENT ROW
(frame_end
will default toCURRENT ROW
)ROWS frame_start
whereframe_start
can beUNBOUNDED PRECEDING
,offset PRECEDING
, orCURRENT ROW
(frame_end
will default toCURRENT ROW
)RANGE BETWEEN frame_start AND frame_end
;frame_start
can be eitherUNBOUNDED PRECEDING
orCURRENT ROW
andframe_end
can be eitherCURRENT ROW
orUNBOUNDED FOLLOWING
ROWS BETWEEN frame_start AND frame_end
;frame_start
/frame_end
can be one of:UNBOUNDED PRECEDING
(frame_start
only)offset PRECEDING
whereoffset
is an integer literalCURRENT ROW
offset FOLLOWING
whereoffset
is an integer literalUNBOUNDED FOLLOWING
(frame_end
only)
In RANGE
mode, a frame_start
of CURRENT ROW
means the frame starts with the current row's first peer row (a row that the window's ORDER BY
clause sorts as equivalent to the current row), while a frame_end
of CURRENT ROW
means the frame ends with the current row's last peer row. In ROWS
mode, CURRENT ROW
simply means the current row.
If no ORDER BY
clause is specified, the window frame will always be RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
and cannot be modified. When an ORDER BY
clause is present, the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
if no explicit window frame is defined in the query.
If there is no FRAME
, no PARTITION BY
, and no ORDER BY
clause specified in the OVER clause (empty OVER
), the whole result set is regarded as one partition, and there's one frame in the window.
The OVER
clause applies a specified supported windows function to compute values over a group of rows and return a single result for each row. The OVER
clause specifies how the rows are arranged and how the aggregation is done on those rows.
Inside the over clause, there are three optional components: PARTITION BY
clause, ORDER BY
clause, and FRAME
clause.
Window functions
Window functions are commonly used to do the following:
Supported window functions are listed in the following table.
Function | Description | Example | Default Value When No Record Selected |
---|---|---|---|
Returns the average of the values for a numeric column in the defined window. |
|
| |
BOOL_AND | Returns |
| |
BOOL_OR | Returns |
| |
Returns the number of values in the window |
|
| |
Returns the minimum value of a numeric column as |
|
| |
Returns the maximum value of a numeric column as |
|
| |
Returns the sum of the values for a numeric column as |
|
| |
The |
| ||
The |
| ||
The |
| ||
The |
| ||
Returns the number of the current row within its partition, counting from 1. |
| ||
RANK | Returns the rank of the current row, with gaps - i.e., the |
| |
DENSE_RANK | Returns the rank of the current row, without gaps. |
|
Note that no window frame clause can be specified for ROW_NUMBER
, RANK
, and DENSE_RANK
window functions since they're applied on the entire partition by definition. Similarly, no window frame clause can be specified for LAG
and LEAD
since the row offset
is an input to those functions themselves.
Window aggregate query examples
Sum transactions by customer ID
Calculate the rolling sum transaction amount ordered by the payment date for each customer ID (note, the default frame here is UNBOUNDED PRECEDING
and CURRENT ROW
).
customer_id | payment_date | amount | sum |
---|---|---|---|
1 | 2023-02-14 23:22:38.996577 | 5.99 | 5.99 |
1 | 2023-02-15 16:31:19.996577 | 0.99 | 6.98 |
1 | 2023-02-15 19:37:12.996577 | 9.99 | 16.97 |
1 | 2023-02-16 13:47:23.996577 | 4.99 | 21.96 |
2 | 2023-02-17 19:23:24.996577 | 2.99 | 2.99 |
2 | 2023-02-17 19:23:24.996577 | 0.99 | 3.98 |
3 | 2023-02-16 00:02:31.996577 | 8.99 | 8.99 |
3 | 2023-02-16 13:47:36.996577 | 6.99 | 15.98 |
3 | 2023-02-17 03:43:41.996577 | 6.99 | 22.97 |
4 | 2023-02-15 07:59:54.996577 | 4.99 | 4.99 |
4 | 2023-02-16 06:37:06.996577 | 0.99 | 5.98 |
Find the minimum or maximum transaction by customer ID
Calculate the least (use MIN()
) or most expensive (use MAX()
) transaction made by each customer comparing all transactions made by the customer (default frame here is UNBOUNDED PRECEDING
and UNBOUNDED FOLLOWING
). The following query shows how to find the least expensive transaction.
customer_id | payment_date | amount | min |
---|---|---|---|
1 | 2023-02-14 23:22:38.996577 | 5.99 | 0.99 |
1 | 2023-02-15 16:31:19.996577 | 0.99 | 0.99 |
1 | 2023-02-15 19:37:12.996577 | 9.99 | 0.99 |
2 | 2023-04-30 04:34:36.996577 | 4.99 | 4.99 |
2 | 2023-04-30 12:16:09.996577 | 10.99 | 4.99 |
3 | 2023-03-23 05:38:40.996577 | 2.99 | 2.99 |
3 | 2023-04-07 08:51:51.996577 | 3.99 | 2.99 |
3 | 3 | 2023-04-08 11:15:37.996577 | 4.99 | 2.99 |
Find the average transaction amount by customer ID
Calculate a customer’s average transaction amount for all transactions they’ve made (default frame here is UNBOUNDED PRECEDING
and UNBOUNDED FOLLOWING
).
customer_id | payment_date | amount | avg |
---|---|---|---|
1 | 2023-02-14 23:22:38.996577 | 5.99 | 5.66 |
1 | 2023-02-15 16:31:19.996577 | 0.99 | 5.66 |
1 | 2023-02-15 19:37:12.996577 | 9.99 | 5.66 |
2 | 2023-04-30 04:34:36.996577 | 4.99 | 7.99 |
2 | 2023-04-30 12:16:09.996577 | 10.99 | 7.99 |
3 | 2023-03-23 05:38:40.996577 | 2.99 | 3.99 |
3 | 2023-04-07 08:51:51.996577 | 3.99 | 3.99 |
3 | 2023-04-08 11:15:37.996577 | 4.99 | 3.99 |
Rank year-to-date sales for a sales team
Use ROW_NUMBER()
to rank team members by their year-to-date sales (default frame here is UNBOUNDED PRECEDING
and UNBOUNDED FOLLOWING
).
Row | FirstName | LastName | Total sales YTD |
---|---|---|---|
1 | Joe | Smith | 2251368.34 |
2 | Alice | Davis | 2151341.64 |
3 | James | Jones | 1551363.54 |
4 | Dane | Scott | 1251358.72 |
Count the number of transactions by customer ID
Count the number of transactions made by each customer (default frame here is UNBOUNDED PRECEDING
and UNBOUNDED FOLLOWING
).
customer_id | payment_date | amount | count |
---|---|---|---|
1 | 2023-02-14 23:22:38.99657 | 10.99 | 2 |
1 | 2023-02-15 16:31:19.996577 | 8.99 | 2 |
2 | 2023-04-30 04:34:36.996577 | 23.50 | 3 |
2 | 2023-04-07 08:51:51.996577 | 12.35 | 3 |
2 | 2023-04-08 11:15:37.996577 | 8.29 | 3 |
Last updated