tencent cloud

Feedback

Time Window Functions

Last updated: 2023-11-08 11:33:35
    In stream processing, streams are unbounded, and we don't know when the data source will continue/stop sending data. Therefore, the way to do aggregations (e.g., count, sum) on streams differs from that in batch processing. In stream processing, windows are used to limit the scope of aggregation, such as "counting website clicks in the last 2 minutes" and "counting the total number of people who liked this video among the recent 100 people". Windows are equivalent to collecting a dynamic table of bounded data so that we can perform aggregations on the data in the table.
    Window functions are a special type of functions that are not used in the projection list of SELECT, but in the GROUP BY clause. Stream Compute Service supports three types of window functions: TUMBLE, HOP, and SESSION.
    To learn about Flink stream processing, see Timely Stream Processing.

    TUMBLE

    A tumbling window assigns data to non-overlapping windows with a fixed size that is customizable. We can perform aggregations on the data within the window.

    Syntax

    TUMBLE(time_attr, interval)
    time_attr is a timestamp parameter that specifies the time when a record is processed. If specified as PROCTIME, it is an automatically generated timestamp that records the moment when the data is processed by Flink. It is generally used in Processing Time mode.
    interval specifies the window size. For example, use INTERVAL '1' DAY to set a 1-day window size and INTERVAL '2' HOUR a 2-hour window size. For other usage, see Date and Time Functions.
    Note
    In Event Time mode (the WATERMARK FOR statement is used to define the timestamp field), the first parameter of the TUMBLE, HOP, and SESSION window functions must be the timestamp field.
    In Processing Time mode, the first parameter of the TUMBLE, HOP, and SESSION window functions must be the calculated column generated by the proctime() function. In the following example, we use PROCTIME, but you should replace it with the actual column name in your job.

    Identification functions

    Identification functions are used to identify the start and end timestamps of windows.
    Function
    Description
    TUMBLE_START(time-attr, size-interval)
    Returns the start timestamp of the window.
    TUMBLE_END(time-attr, size-interval)
    Returns the end timestamp of the window.

    Example

    This example helps you better understand a tumbling window. It uses Event Time to count the hourly income of each user.
    Example data:
    username (VARCHAR)
    income (BIGINT)
    times (TIMESTAMP)
    Tom
    20
    2021-11-11 10:30:00.0
    Jack
    10
    2021-11-11 10:35:00.0
    Tom
    10
    2021-11-11 10:35:00.0
    Tom
    10
    2021-11-11 10:40:00.0
    Tom
    15
    2021-11-11 11:30:00.0
    Jack
    10
    2021-11-11 11:30:00.0
    Jack
    15
    2021-11-11 11:40:00.0
    SQL statements:
    CREATE TABLE user_income (
    username VARCHAR,
    Income INT,
    times TIMESTAMP(3),
    WATERMARK FOR times AS times - INTERVAL '3' SECOND
    ) WITH (
    ...
    );
    
    CREATE TABLE output (
    win_start TIMESTAMP,
    win_end TIMESTAMP,
    username VARCHAR,
    hour_income BIGINT
    )WITH(
    ...
    );
    
    INSERT INTO output
    SELECT
    TUMBLE_START(times,INTERVAL '1' HOUR),
    TUMBLE_END(times,INTERVAL '1' HOUR),
    username,
    SUM(Income)
    FROM user_income
    GROUP BY TUMBLE(times,INTERVAL '1' HOUR),username;
    Output result:
    win_start (TIMESTAMP)
    win_end (TIMESTAMP)
    username (VARCHAR)
    hour_income (BIGINT)
    2021-11-11 10:00:00.0
    2021-11-11 11:00:00.0
    Tom
    40
    2021-11-11 10:00:00.0
    2021-11-11 11:00:00.0
    Jack
    10
    2021-11-11 11:00:00.0
    2021-11-11 12:00:00.0
    Tom
    15
    2021-11-11 11:00:00.0
    2021-11-11 12:00:00.0
    Jack
    25

    HOP

    A hopping window assigns elements to fixed-size windows. Similar to a tumbling window, a hopping window supports window size customization. The other parameter controls how frequently a window is started.
    A hopping window maintains a fixed window size and slides at a specified hop interval, allowing overlapping windows.
    The hop interval determines the frequency at which Flink creates new windows.
    If the hop interval is smaller than the window size, hopping windows are overlapping. In this case, elements are assigned to multiple windows.
    If the hop interval is greater than the window size, some events may be discarded.
    If the hop interval is equal to the window size, this window is equivalent to a tumbling window.

    Syntax

    HOP(time_attr, sliding_interval, window_size_interval)
    time_attr is a timestamp parameter that specifies the time when a record is processed. If specified as PROCTIME, it is an automatically generated timestamp that records the moment when the data is processed by Flink. It is generally used in Processing Time mode.
    window_size_interval specifies the window size. For example, use INTERVAL '1' DAY to set a 1-day window size and INTERVAL '2' HOUR a 2-hour window size. For other usage, see Date and Time Functions.
    sliding_interval specifies the hop interval. For example, use INTERVAL '1' DAY to set a 1-day window size and INTERVAL '2' HOUR a 2-hour window size. For other usage, see Date and Time Functions.

    Identification functions

    Identification functions are used to identify the start and end timestamps of windows.
    Function
    Description
    HOP_START(time-attr, slide-interval,size-interval)
    Returns the start timestamp of the window.
    HOP_END(time-attr, slide-interval,size-interval)
    Returns the end timestamp of the window.

    Example

    This example helps you better understand a hopping window. It uses Event Time to count the hourly income of each user, which is updated every 30 minutes. The window size is 1 hour, with a hop interval of 10 minutes.
    Example data:
    username (VARCHAR)
    income (BIGINT)
    times (TIMESTAMP)
    Tom
    20
    2021-11-11 10:30:00.0
    Jack
    10
    2021-11-11 10:35:00.0
    Tom
    10
    2021-11-11 10:35:00.0
    Tom
    10
    2021-11-11 10:40:00.0
    Tom
    15
    2021-11-11 11:35:00.0
    Jack
    10
    2021-11-11 11:30:00.0
    Jack
    15
    2021-11-11 11:40:00.0
    SQL statements:
    CREATE TABLE user_income (
    username VARCHAR,
    Income INT,
    times TIMESTAMP(3),
    WATERMARK FOR times AS times - INTERVAL '3' MINUTE
    )WITH(
    ...
    );
    
    CREATE TABLE output (
    win_start TIMESTAMP,
    win_end TIMESTAMP,
    username VARCHAR,
    hour_income BIGINT
    )WITH(
    ...
    );
    
    INSERT INTO output
    SELECT
    HOP_START(times,INTERVAL '30' MINUTE,INTERVAL '1' HOUR),
    HOP_END(times,INTERVAL '30' MINUTE,INTERVAL '1' HOUR),
    username,
    SUM(income)
    FROM user_income
    GROUP BY HOP(times,INTERVAL '30' MINUTE,INTERVAL '1' HOUR),username;
    Output result:
    win_start (TIMESTAMP)
    win_end (TIMESTAMP)
    username (VARCHAR)
    hour_income (BIGINT)
    2021-11-11 10:00:00.0
    2021-11-11 11:00:00.0
    Tom
    40
    2021-11-11 10:00:00.0
    2021-11-11 11:00:00.0
    Jack
    10
    2021-11-11 10:30:00.0
    2021-11-11 11:30:00.0
    Jack
    10
    2021-11-11 10:30:00.0
    2021-11-11 11:30:00.0
    Tom
    40
    2021-11-11 11:00:00.0
    2021-11-11 12:00:00.0
    Tom
    15
    2021-11-11 11:00:00.0
    2021-11-11 12:00:00.0
    Jack
    25
    2021-11-11 11:30:00.0
    2021-11-11 12:30:00.0
    Jack
    25
    2021-11-11 11:30:00.0
    2021-11-11 12:30:00.0
    Tom
    15

    SESSION

    A session window groups elements by session activity. Unlike a tumbling or hopping window, a session window does not have overlapping windows or fixed start and end timestamps. Instead, the window is closed when it no longer receives elements within a fixed time period, which is called an inactive interval. A session window is configured using a session interval. The session interval defines the length of the inactive period. When this period elapses, the current session is closed and subsequent elements are assigned to a new session window.
    Session windows are based on inactivity rather than size. For example, if there is no activity (no new data) for more than 30 minutes, the existing window is closed, and a new window starts when new data is observed later.

    Syntax

    SESSION(time_attr, interval)
    time_attr is a timestamp parameter that specifies the time when a record is processed. If specified as PROCTIME, it is an automatically generated timestamp that records the moment when the data is processed by Flink. It is generally used in Processing Time mode.
    interval specifies the window size. For example, use INTERVAL '1' DAY to set a 1-day window size and INTERVAL '2' HOUR a 2-hour window size. For other usage, see Date and Time Functions.

    Identification functions

    Identification functions are used to identify the start and end timestamps of windows.
    Function
    Description
    SESSION_START(time-attr, size-interval)
    Returns the start timestamp of the window.
    SESSION_END(time-attr, size-interval)
    Returns the end timestamp of the window.

    Example

    This example helps you better understand a session window. It uses Event Time to count the hourly income of each user, with a session timeout period of 30 minutes.
    Example data:
    username (VARCHAR)
    income (BIGINT)
    times (TIMESTAMP)
    Tom
    20
    2021-11-11 10:30:00.0
    Jack
    10
    2021-11-11 10:35:00.0
    Tom
    10
    2021-11-11 10:35:00.0
    Tom
    10
    2021-11-11 10:40:00.0
    Tom
    15
    2021-11-11 11:50:00.0
    Jack
    10
    2021-11-11 11:40:00.0
    Jack
    15
    2021-11-11 11:45:00.0
    SQL statements:
    CREATE TABLE user_income (
    username VARCHAR,
    Income INT,
    times TIMESTAMP(3),
    WATERMARK FOR times AS times - INTERVAL '3' MINUTE
    )WITH(
    ...
    );
    
    CREATE TABLE output (
    win_start TIMESTAMP,
    win_end TIMESTAMP,
    username VARCHAR,
    hour_income BIGINT
    )WITH(
    ...
    );
    
    INSERT INTO output
    SELECT
    SESSION_START(times,INTERVAL '30' MINUTE),
    SESSION_END(times,INTERVAL '30' MINUTE),
    username,
    SUM(Income)
    FORM user_income
    GROUP BY SESSION(times,INTERVAL '30' MINUTE),username;
    Output result:
    win_start (TIMESTAMP)
    win_end (TIMESTAMP)
    username (VARCHAR)
    hour_income (BIGINT)
    2021-11-11 10:30:00.0
    2021-11-11 11:10:00.0
    Tom
    40
    2021-11-11 10:35:00.0
    2021-11-11 11:05:00.0
    Jack
    10
    2021-11-11 11:30:00.0
    2021-11-11 12:00:00.0
    Tom
    15
    2021-11-11 11:30:00.0
    2021-11-11 12:10:00.0
    Jack
    25

    More notes

    The preceding windows all have corresponding auxiliary functions, which are the same except for the prefix. Here, the auxiliary functions for tumbling windows are used as examples.
    TUMBLE_ROWTIME: (Event Time mode) Returns the timestamp of the inclusive upper bound of the tumbling window. The resulting attribute can be used in subsequent time-based operations such as interval joins and group window or over window aggregations. Example:
    SELECT user,
    TUMBLE_START(rowtime, INTERVAL '12' HOUR) AS sStart,
    TUMBLE_ROWTIME(rowtime, INTERVAL '12' HOUR) AS snd,
    SUM(amount)
    FROM Orders
    GROUP BY TUMBLE(rowtime, INTERVAL '12' HOUR), user
    TUMBLE_PROCTIME: (Processing Time mode) Returns the timestamp of the inclusive upper bound of the tumbling window. The resulting attribute can be used in subsequent time-based operations such as interval joins and group window or over window aggregations. Example:
    SELECT user,
    TUMBLE_START(PROCTIME, INTERVAL '12' HOUR) AS sStart,
    TUMBLE_PROCTIME(PROCTIME, INTERVAL '12' HOUR) AS snd,
    SUM(amount)
    FROM Orders
    GROUP BY TUMBLE(PROCTIME, INTERVAL '12' HOUR), user
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support