

After careful consideration, we have decided to discontinue Amazon Kinesis Data Analytics for SQL applications:

1. From **September 1, 2025**, we won't provide any bug fixes for Amazon Kinesis Data Analytics for SQL applications because we will have limited support for it, given the upcoming discontinuation.

2. From **October 15, 2025**, you will not be able to create new Kinesis Data Analytics for SQL applications.

3. We will delete your applications starting **January 27, 2026**. You will not be able to start or operate your Amazon Kinesis Data Analytics for SQL applications. Support will no longer be available for Amazon Kinesis Data Analytics for SQL from that time. For more information, see [Amazon Kinesis Data Analytics for SQL Applications discontinuation](discontinuation.md).

# Streaming SQL Concepts
<a name="streaming-sql-concepts"></a>

Amazon Kinesis Data Analytics implements the ANSI 2008 SQL standard with extensions. These extensions enable you to process streaming data. The following topics cover key streaming SQL concepts.

**Topics**
+ [In-Application Streams and Pumps](streams-pumps.md)
+ [Timestamps and the ROWTIME Column](timestamps-rowtime-concepts.md)
+ [Continuous Queries](continuous-queries-concepts.md)
+ [Windowed Queries](windowed-sql.md)
+ [Streaming Data Operations: Stream Joins](stream-joins-concepts.md)



# In-Application Streams and Pumps
<a name="streams-pumps"></a>

When you configure [application input](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works-input.html), you map a streaming source to an in-application stream that is created. Data continuously flows from the streaming source into the in-application stream. An in-application stream works like a table that you can query using SQL statements, but it's called a stream because it represents continuous data flow. 

**Note**  
Do not confuse in-application streams with Amazon Kinesis data streams and Firehose delivery streams. In-application streams exist only in the context of an Amazon Kinesis Data Analytics application. Kinesis data streams and Firehose delivery streams exist independent of your application. You can configure them as a streaming source in your application input configuration or as a destination in output configuration.

You can also create more in-application streams as needed to store intermediate query results. Creating an in-application stream is a two-step process. First, you create an in-application stream, and then you pump data into it. For example, suppose that the input configuration of your application creates an in-application stream named `INPUTSTREAM`. In the following example, you create another stream (`TEMPSTREAM`), and then you pump data from `INPUTSTREAM` into it. 

1. Create an in-application stream (`TEMPSTREAM`) with three columns, as shown following:

   ```
   CREATE OR REPLACE STREAM "TEMPSTREAM" ( 
      "column1" BIGINT NOT NULL, 
      "column2" INTEGER, 
      "column3" VARCHAR(64));
   ```

   The column names are specified in quotes, making them case sensitive. For more information, see [Identifiers](https://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sql-reference-identifiers.html) in the *Amazon Kinesis Data Analytics SQL Reference*.

1. Insert data into the stream using a pump. A pump is a continuous insert query running that inserts data from one in-application stream to another in-application stream. The following statement creates a pump (`SAMPLEPUMP`) and inserts data into the `TEMPSTREAM` by selecting records from another stream (`INPUTSTREAM`).

   ```
   CREATE OR REPLACE PUMP "SAMPLEPUMP" AS 
   INSERT INTO "TEMPSTREAM" ("column1", 
                             "column2", 
                             "column3") 
   SELECT STREAM inputcolumn1, 
                 inputcolumn2, 
                 inputcolumn3
   FROM "INPUTSTREAM";
   ```

You can have multiple writers insert into an in-application stream, and there can be multiple readers selected from the stream. Think of an in-application stream as implementing a publish/subscribe messaging paradigm. In this paradigm, the data row, including the time of creation and time of receipt, can be processed, interpreted, and forwarded by a cascade of streaming SQL statements, without having to be stored in a traditional RDBMS.

After an in-application stream is created, you can perform normal SQL queries. 

**Note**  
When you query streams, most SQL statements are bound using a row-based or time-based window. For more information, see [Windowed Queries](windowed-sql.md).

You can also join streams. For examples of joining streams, see [Streaming Data Operations: Stream Joins](stream-joins-concepts.md).

# Timestamps and the ROWTIME Column
<a name="timestamps-rowtime-concepts"></a>

In-application streams include a special column called `ROWTIME`. It stores a timestamp when Amazon Kinesis Data Analytics inserts a row in the first in-application stream. `ROWTIME` reflects the timestamp at which Amazon Kinesis Data Analytics inserted a record into the first in-application stream after reading from the streaming source. This `ROWTIME` value is then maintained throughout your application. 

**Note**  
When you pump records from one in-application stream into another, you don't need to explicitly copy the `ROWTIME` column, Amazon Kinesis Data Analytics copies this column for you.

Amazon Kinesis Data Analytics guarantees that the `ROWTIME` values are monotonically increased. You use this timestamp in time-based windowed queries. For more information, see [Windowed Queries](windowed-sql.md).

You can access the ROWTIME column in your `SELECT` statement like any other columns in your in-application stream. For example:

```
SELECT STREAM ROWTIME, 
              some_col_1, 
              some_col_2
FROM  SOURCE_SQL_STREAM_001
```

## Understanding Various Times in Streaming Analytics
<a name="out-of-order-rows"></a>

In addition to `ROWTIME`, there are other types of times in real-time streaming applications. These are:
+ **Event time** – The timestamp when the event occurred. This is also sometimes called the *client-side time*. It is often desirable to use this time in analytics because it is the time when an event occurred. However, many event sources, such as mobile phones and web clients, do not have reliable clocks, which can lead to inaccurate times. In addition, connectivity issues can lead to records appearing on a stream not in the same order the events occurred.

   
+ **Ingest time** – The timestamp of when record was added to the streaming source. Amazon Kinesis Data Streams includes a field called `APPROXIMATE_ARRIVAL_TIME` in every record that provides this timestamp. This is also sometimes referred to as the *server-side time*. This ingest time is often the close approximation of event time. If there is any kind of delay in the record ingestion to the stream, this can lead to inaccuracies, which are typically rare. Also, the ingest time is rarely out of order, but it can occur due to the distributed nature of streaming data. Therefore, Ingest time is a mostly accurate and in-order reflection of the event time. 

   
+ **Processing time** – The timestamp when Amazon Kinesis Data Analytics inserts a row in the first in-application stream. Amazon Kinesis Data Analytics provides this timestamp in the `ROWTIME` column that exists in each in-application stream. The processing time is always monotonically increasing. But it will not be accurate if your application falls behind. (If an application falls behind, the processing time does not accurately reflect the event time.) This `ROWTIME` is accurate in relation to the wall clock, but it might not be the time when the event actually occurred. 

Using each of these times in windowed queries that are time-based has advantages and disadvantages. We recommend that you choose one or more of these times, and a strategy to deal with the relevant disadvantages based on your use case scenario. 

**Note**  
If you are using row-based windows, time is not an issue and you can ignore this section.

We recommend a two-window strategy that uses two time-based, both `ROWTIME` and one of the other times (ingest or event time). 
+ Use `ROWTIME` as the first window, which controls how frequently the query emits the results, as shown in the following example. It is not used as a logical time.
+ Use one of the other times that is the logical time that you want to associate with your analytics. This time represents when the event occurred. In the following example, the analytics goal is to group the records and return count by ticker.

The advantage of this strategy is that it can use a time that represents when the event occurred. It can gracefully handle when your application falls behind or when events arrive out of order. If the application falls behind when bringing records into the in-application stream, they are still grouped by the logical time in the second window. The query uses `ROWTIME` to guarantee the order of processing. Any records that are late (the ingest timestamp shows an earlier value compared to the `ROWTIME` value) are also processed successfully.

Consider the following query against the demo stream used in the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html). The query uses the `GROUP BY` clause and emits a ticker count in a one-minute tumbling window. 

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" 
    ("ingest_time"    timestamp,
    "APPROXIMATE_ARRIVAL_TIME" timestamp,
    "ticker_symbol"  VARCHAR(12), 
    "symbol_count"        integer);
            
            
CREATE OR REPLACE PUMP "STREAM_PUMP" AS
    INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND) AS "ingest_time",
        STEP("SOURCE_SQL_STREAM_001".APPROXIMATE_ARRIVAL_TIME BY INTERVAL '60' SECOND) AS "APPROXIMATE_ARRIVAL_TIME",
        "TICKER_SYMBOL",
        COUNT(*) AS "symbol_count"
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY "TICKER_SYMBOL",
        STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND),
        STEP("SOURCE_SQL_STREAM_001".APPROXIMATE_ARRIVAL_TIME BY INTERVAL '60' SECOND);
```

In `GROUP BY`, you first group the records based on `ROWTIME` in a one-minute window and then by `APPROXIMATE_ARRIVAL_TIME`.

The timestamp values in the result are rounded down to the nearest 60-second interval. The first group result emitted by the query shows records in the first minute. The second group of results emitted shows records in the next minutes based on `ROWTIME`. The last record indicates that the application was late in bringing the record in the in-application stream (it shows a late `ROWTIME` value compared to the ingest timestamp).

```
ROWTIME                  INGEST_TIME     TICKER_SYMBOL  SYMBOL_COUNT

--First one minute window.
2016-07-19 17:05:00.0    2016-07-19 17:05:00.0    ABC      10
2016-07-19 17:05:00.0    2016-07-19 17:05:00.0    DEF      15
2016-07-19 17:05:00.0    2016-07-19 17:05:00.0    XYZ      6
–-Second one minute window.
2016-07-19 17:06:00.0    2016-07-19 17:06:00.0    ABC      11
2016-07-19 17:06:00.0    2016-07-19 17:06:00.0    DEF      11
2016-07-19 17:06:00.0    2016-07-19 17:05:00.0    XYZ      1  *** 

***late-arriving record, instead of appearing in the result of the 
first 1-minute windows (based on ingest_time, it is in the result 
of the second 1-minute window.
```

You can combine the results for a final accurate count per minute by pushing the results to a downstream database. For example, you can configure the application output to persist the results to a Firehose delivery stream that can write to an Amazon Redshift table. After results are in an Amazon Redshift table, you can query the table to compute the total count group by `Ticker_Symbol`. In the case of `XYZ`, the total is accurate (6\$11) even though a record arrived late.

# Continuous Queries
<a name="continuous-queries-concepts"></a>

A query over a stream executes continuously over streaming data. This continuous execution enables scenarios, such as the ability for applications to continuously query a stream and generate alerts. 

In the Getting Started exercise, you have an in-application stream named `SOURCE_SQL_STREAM_001`. It continuously receives stock prices from a demo stream (a Kinesis data stream). The schema is as follows:

```
(TICKER_SYMBOL VARCHAR(4), 
 SECTOR varchar(16), 
 CHANGE REAL, 
 PRICE REAL)
```

Suppose that you are interested in stock price changes greater than 15 percent. You can use the following query in your application code. This query runs continuously and emits records when a stock price change greater than 15 percent is detected.

```
SELECT STREAM TICKER_SYMBOL, PRICE 
      FROM   "SOURCE_SQL_STREAM_001"
      WHERE  (ABS((CHANGE / (PRICE-CHANGE)) * 100)) > 15
```

Use the following procedure to set up an Amazon Kinesis Data Analytics application and test this query.

**To test the query**

1. Create an application by following the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html).

1. Replace the `SELECT` statement in the application code with the preceding `SELECT` query. The resulting application code is shown following:

   ```
   CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), 
                                                      price DOUBLE);
   -- CREATE OR REPLACE PUMP to insert into output
   CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
     INSERT INTO "DESTINATION_SQL_STREAM" 
         SELECT STREAM TICKER_SYMBOL, 
                       PRICE 
         FROM   "SOURCE_SQL_STREAM_001"
         WHERE  (ABS((CHANGE / (PRICE-CHANGE)) * 100)) > 15;
   ```

# Windowed Queries
<a name="windowed-sql"></a>

SQL queries in your application code execute continuously over in-application streams. An in-application stream represents unbounded data that flows continuously through your application. Therefore, to get result sets from this continuously updating input, you often bound queries using a window defined in terms of time or rows. These are also called *windowed SQL*. 

For a time-based windowed query, you specify the window size in terms of time (for example, a one-minute window). This requires a timestamp column in your in-application stream that is monotonically increasing. (The timestamp for a new row is greater than or equal to the previous row.) Amazon Kinesis Data Analytics provides such a timestamp column called `ROWTIME` for each in-application stream. You can use this column when specifying time-based queries. For your application, you might choose some other timestamp option. For more information, see [Timestamps and the ROWTIME Column](timestamps-rowtime-concepts.md).

For a row-based windowed query, you specify the window size in terms of the number of rows.

You can specify a query to process records in a tumbling window, sliding window, or stagger window manner, depending on your application needs. Kinesis Data Analytics supports the following window types:
+ [Stagger Windows](stagger-window-concepts.md): A query that aggregates data using keyed time-based windows that open as data arrives. The keys allow for multiple overlapping windows. This is the recommended way to aggregate data using time-based windows, because Stagger Windows reduce late or out-of-order data compared to Tumbling windows.
+ [Tumbling Windows](tumbling-window-concepts.md): A query that aggregates data using distinct time-based windows that open and close at regular intervals.
+ [Sliding Windows](sliding-window-concepts.md): A query that aggregates data continuously, using a fixed time or rowcount interval.

# Stagger Windows
<a name="stagger-window-concepts"></a>

Using *stagger windows* is a windowing method that is suited for analyzing groups of data that arrive at inconsistent times. It is well suited for any time-series analytics use case, such as a set of related sales or log records.

For example, [VPC Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html#flow-logs-limitations) have a capture window of approximately 10 minutes. But they can have a capture window of up to 15 minutes if you're aggregating data on the client. Stagger windows are ideal for aggregating these logs for analysis.

Stagger windows address the issue of related records not falling into the same time-restricted window, such as when tumbling windows were used.

## Partial Results with Tumbling Windows
<a name="stagger-window-tumbling"></a>

There are certain limitations with using [Tumbling Windows](tumbling-window-concepts.md) for aggregating late or out-of-order data.

If tumbling windows are used to analyze groups of time-related data, the individual records might fall into separate windows. So then the partial results from each window must be combined later to yield complete results for each group of records. 

In the following tumbling window query, records are grouped into windows by row time, event time, and ticker symbol:

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    TICKER_SYMBOL VARCHAR(4),
    EVENT_TIME timestamp,
    TICKER_COUNT     DOUBLE);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        TICKER_SYMBOL,
        FLOOR(EVENT_TIME TO MINUTE),
        COUNT(TICKER_SYMBOL) AS TICKER_COUNT
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, FLOOR(EVENT_TIME TO MINUTE), STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '1' MINUTE);
```

In the following diagram, an application is counting the number of trades it receives, based on when the trades happened (event time) with one minute of granularity. The application can use a tumbling window for grouping data based on row time and event time. The application receives four records that all arrive within one minute of each other. It groups the records by row time, event time, and ticker symbol. Because some of the records arrive after the first tumbling window ends, the records do not all fall within the same one-minute tumbling window.

![\[Tumbling windows diagram showing data grouping by row time, event time, and ticker symbol over two minutes.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/dev/images/stagger_0.png)


The preceding diagram has the following events.


****  

| ROWTIME | EVENT\$1TIME | TICKER\$1SYMBOL | 
| --- | --- | --- | 
| 11:00:20 | 11:00:10 | AMZN | 
| 11:00:30 | 11:00:20 | AMZN | 
| 11:01:05 | 11:00:55 | AMZN | 
| 11:01:15 | 11:01:05 | AMZN | 

The result set from the tumbling window application looks similar to the following.


****  

| ROWTIME | EVENT\$1TIME | TICKER\$1SYMBOL | COUNT | 
| --- | --- | --- | --- | 
| 11:01:00 | 11:00:00 | AMZN | 2  | 
| 11:02:00 | 11:00:00 | AMZN | 1  | 
| 11:02:00 | 11:01:00 | AMZN | 1  | 

In the result set preceding, three results are returned:
+ A record with a `ROWTIME` of 11:01:00 that aggregates the first two records.
+ A record at 11:02:00 that aggregates just the third record. This record has a `ROWTIME` within the second window, but an `EVENT_TIME` within the first window.
+ A record at 11:02:00 that aggregates just the fourth record.

To analyze the complete result set, the records must be aggregated in the persistence store. This adds complexity and processing requirements to the application.

## Complete Results with Stagger Windows
<a name="stagger-window-concepts-stagger"></a>

To improve the accuracy of analyzing time-related data records, Kinesis Data Analytics offers a new window type called *stagger windows*. In this window type, windows open when the first event matching the partition key arrives, and not on a fixed time interval. The windows close based on the age specified, which is measured from the time when the window opened.

A stagger window is a separate time-restricted window for each key grouping in a window clause. The application aggregates each result of the window clause inside its own time window, rather than using a single window for all results.

In the following stagger window query, records are grouped into windows by event time and ticker symbol:

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol    VARCHAR(4), 
    event_time       TIMESTAMP,
    ticker_count     DOUBLE);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        TICKER_SYMBOL,
        FLOOR(EVENT_TIME TO MINUTE),
        COUNT(TICKER_SYMBOL) AS ticker_count
    FROM "SOURCE_SQL_STREAM_001"
    WINDOWED BY STAGGER (
            PARTITION BY FLOOR(EVENT_TIME TO MINUTE), TICKER_SYMBOL RANGE INTERVAL '1' MINUTE);
```

In the following diagram, events are aggregated by event time and ticker symbol into stagger windows.

![\[Diagram showing event aggregation into stagger windows by event time and ticker symbol.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/dev/images/stagger_1.png)


The preceding diagram has the following events, which are the same events as the tumbling window application analyzed:


****  

| ROWTIME | EVENT\$1TIME | TICKER\$1SYMBOL | 
| --- | --- | --- | 
| 11:00:20 | 11:00:10 | AMZN | 
| 11:00:30 | 11:00:20 | AMZN | 
| 11:01:05 | 11:00:55 | AMZN | 
| 11:01:15 | 11:01:05 | AMZN | 

The result set from the stagger window application looks similar to the following.


****  

| ROWTIME | EVENT\$1TIME | TICKER\$1SYMBOL | Count | 
| --- | --- | --- | --- | 
| 11:01:20 | 11:00:00 | AMZN | 3 | 
| 11:02:15 | 11:01:00 | AMZN | 1 | 

The returned record aggregates the first three input records. The records are grouped by one-minute stagger windows. The stagger window starts when the application receives the first AMZN record (with a `ROWTIME` of 11:00:20). When the 1-minute stagger window expires (at 11:01:20), a record with the results that fall within the stagger window (based on `ROWTIME` and `EVENT_TIME`) is written to the output stream. Using a stagger window, all of the records with a `ROWTIME` and `EVENT_TIME` within a one-minute window are emitted in a single result.

The last record (with an `EVENT_TIME` outside the one-minute aggregation) is aggregated separately. This is because `EVENT_TIME` is one of the partition keys that is used to separate the records into result sets, and the partition key for `EVENT_TIME` for the first window is `11:00`.

The syntax for a stagger window is defined in a special clause, `WINDOWED BY`. This clause is used instead of the `GROUP BY` clause for streaming aggregations. The clause appears immediately after the optional `WHERE` clause and before the `HAVING` clause. 

The stagger window is defined in the `WINDOWED BY` clause and takes two parameters: partition keys and window length. The partition keys partition the incoming data stream and define when the window opens. A stagger window opens when the first event with a unique partition key appears on the stream. The stagger window closes after a fixed time period defined by the window length. The syntax is shown in the following code example:

```
...
FROM <stream-name>
WHERE <... optional statements...>
WINDOWED BY STAGGER(
	PARTITION BY <partition key(s)>
	RANGE INTERVAL <window length, interval>
);
```

# Tumbling Windows (Aggregations Using GROUP BY)
<a name="tumbling-window-concepts"></a>

When a windowed query processes each window in a non-overlapping manner, the window is referred to as a *tumbling window*. In this case, each record on an in-application stream belongs to a specific window. It is processed only once (when the query processes the window to which the record belongs).

![\[Timeline showing non-overlapping windows processing data streams at distinct time intervals.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/dev/images/window-tumbling-20.png)


For example, an aggregation query using a `GROUP BY` clause processes rows in a tumbling window. The demo stream in the [getting started exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) receives stock price data that is mapped to the in-application stream `SOURCE_SQL_STREAM_001` in your application. This stream has the following schema.

```
(TICKER_SYMBOL VARCHAR(4), 
 SECTOR varchar(16), 
 CHANGE REAL, 
 PRICE REAL)
```

In your application code, suppose that you want to find aggregate (min, max) prices for each ticker over a one-minute window. You can use the following query.

```
SELECT STREAM ROWTIME,
              Ticker_Symbol,
              MIN(Price) AS Price,
              MAX(Price) AS Price
FROM     "SOURCE_SQL_STREAM_001"
GROUP BY Ticker_Symbol, 
         STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND);
```

The preceding is an example of a windowed query that is time-based. The query groups records by `ROWTIME` values. For reporting on a per-minute basis, the `STEP` function rounds down the `ROWTIME` values to the nearest minute. 

**Note**  
You can also use the `FLOOR` function to group records into windows. However, `FLOOR` can only round time values down to a whole time unit (hour, minute, second, and so on). `STEP` is recommended for grouping records into tumbling windows because it can round values down to an arbitrary interval, for example, 30 seconds.

This query is an example of a nonoverlapping (tumbling) window. The `GROUP BY` clause groups records in a one-minute window, and each record belongs to a specific window (no overlapping). The query emits one output record per minute, providing the min/max ticker price recorded at the specific minute. This type of query is useful for generating periodic reports from the input data stream. In this example, reports are generated each minute. 

**To test the query**

1. Set up an application by following the [getting started exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html).

1. Replace the `SELECT` statement in the application code by the preceding `SELECT` query. The resulting application code is shown following:

   ```
   CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
                                      ticker_symbol VARCHAR(4), 
                                      Min_Price     DOUBLE, 
                                      Max_Price     DOUBLE);
   -- CREATE OR REPLACE PUMP to insert into output
   CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
     INSERT INTO "DESTINATION_SQL_STREAM" 
       SELECT STREAM Ticker_Symbol,
                     MIN(Price) AS Min_Price,
                     MAX(Price) AS Max_Price
       FROM    "SOURCE_SQL_STREAM_001"
       GROUP BY Ticker_Symbol, 
                STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND);
   ```

# Sliding Windows
<a name="sliding-window-concepts"></a>

Instead of grouping records using `GROUP BY`, you can define a time-based or row-based window. You do this by adding an explicit `WINDOW` clause. 

In this case, as the window slides with time, Amazon Kinesis Data Analytics emits an output when new records appear on the stream. Kinesis Data Analytics emits this output by processing rows in the window. Windows can overlap in this type of processing, and a record can be part of multiple windows and be processed with each window. The following example illustrates a sliding window.

Consider a simple query that counts records on the stream. This example assumes a 5-second window. In the following example stream, new records arrive at time t1, t2, t6, and t7, and three records arrive at time t8 seconds.

![\[Timeline showing record arrivals at t1, t2, t6, t7, and multiple at t8 within a 5-second window.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/dev/images/sliding-10.png)


Keep the following in mind:
+ The example assumes a 5-second window. The 5-second window slides continuously with time. 
+ For every row that enters a window, an output row is emitted by the sliding window. Soon after the application starts, you see the query emit output for every new record that appears on the stream, even though a 5-second window hasn't passed yet. For example, the query emits output when a record appears in the first second and second second. Later, the query processes records in the 5-second window.
+ The windows slide with time. If an old record on the stream falls out of the window, the query doesn't emit output unless there is also a new record on the stream that falls within that 5-second window.

  

Suppose that the query starts executing at t0. Then the following occurs:

1. At the time t0, the query starts. The query doesn't emit output (count value) because there are no records at this time.  
![\[Timeline showing a stream starting at t0 with no output initially indicated.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/dev/images/sliding-t0.png)

1. At time t1, a new record appears on the stream, and the query emits count value 1.   
![\[Timeline showing a stream with a record appearing at time t1, and an arrow pointing to t0.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/dev/images/sliding-t1.png)

1. At time t2, another record appears, and the query emits count 2.   
![\[Timeline showing stream events at different time points, with two vertical bars at the end.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/dev/images/sliding-t2.png)

1. The 5-second window slides with time:
   + At t3, the sliding window t3 to t0
   + At t4 (sliding window t4 to t0)
   + At t5 the sliding window t5–t0

   At all of these times, the 5-second window has the same records—there are no new records. Therefore, the query doesn't emit any output.  
![\[Timeline showing stream with multiple time points and colored rectangles representing data windows.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/dev/images/sliding-t3-4-5.png)

1. At time t6, the 5-second window is (t6 to t1). The query detects one new record at t6 so it emits output 2. The record at t1 is no longer in the window and doesn't count.   
![\[Timeline showing stream events at different time points with a sliding 5-second window.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/dev/images/sliding-t6.png)

1. At time t7, the 5-second window is t7 to t2. The query detects one new record at t7 so it emits output 2. The record at t2 is no longer in the 5-second window, and therefore isn't counted.   
![\[Timeline showing stream events and time points from t0 to t7, with a 5-second window highlighted.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/dev/images/sliding-t7.png)

1. At time t8, the 5-second window is t8 to t3. The query detects three new records, and therefore emits record count 5.  
![\[Timeline showing stream events with orange bars representing record counts at different time intervals.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/dev/images/sliding-t8.png)

In summary, the window is a fixed size and slides with time. The query emits output when new records appear. 

**Note**  
We recommend that you use a sliding window no longer than one hour. If you use a longer window, the application takes longer to restart after regular system maintenance. This is because the source data must be read from the stream again.

The following example queries use the `WINDOW` clause to define windows and perform aggregates. Because the queries don't specify `GROUP BY`, the query uses the sliding window approach to process records on the stream. 



## Example 1: Process a Stream Using a 1-Minute Sliding Window
<a name="sliding-ex1"></a>

Consider the demo stream in the Getting Started exercise that populates the in-application stream, `SOURCE_SQL_STREAM_001`. The following is the schema.

```
(TICKER_SYMBOL VARCHAR(4), 
 SECTOR varchar(16),
 CHANGE REAL,
 PRICE REAL)
```

Suppose that you want your application to compute aggregates using a sliding 1-minute window. That is, for each new record that appears on the stream, you want the application to emit an output by applying aggregates on records in the preceding 1-minute window.

You can use the following time-based windowed query. The query uses the `WINDOW` clause to define the 1-minute range interval. The `PARTITION BY` in the `WINDOW` clause groups records by ticker values within the sliding window. 

```
SELECT STREAM ticker_symbol,
              MIN(Price) OVER W1 AS Min_Price,
              MAX(Price) OVER W1 AS Max_Price,
              AVG(Price) OVER W1 AS Avg_Price
FROM   "SOURCE_SQL_STREAM_001"
WINDOW W1 AS (
   PARTITION BY ticker_symbol 
   RANGE INTERVAL '1' MINUTE PRECEDING);
```

**To test the query**

1. Set up an application by following the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html).

1. Replace the `SELECT` statement in the application code with the preceding `SELECT` query. The resulting application code is the following.

   ```
   CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
                            ticker_symbol VARCHAR(10), 
                            Min_Price     double, 
                            Max_Price     double, 
                            Avg_Price     double);
   CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
      INSERT INTO "DESTINATION_SQL_STREAM"
        SELECT STREAM ticker_symbol,
                      MIN(Price) OVER W1 AS Min_Price,
                      MAX(Price) OVER W1 AS Max_Price,
                      AVG(Price) OVER W1 AS Avg_Price
        FROM   "SOURCE_SQL_STREAM_001"
        WINDOW W1 AS (
           PARTITION BY ticker_symbol 
           RANGE INTERVAL '1' MINUTE PRECEDING);
   ```

## Example 2: Query Applying Aggregates on a Sliding Window
<a name="sliding-ex2"></a>

The following query on the demo stream returns the average of the percent change in the price of each ticker in a 10-second window.

```
SELECT STREAM Ticker_Symbol,
              AVG(Change / (Price - Change)) over W1 as Avg_Percent_Change
FROM "SOURCE_SQL_STREAM_001"
WINDOW W1 AS (
        PARTITION BY ticker_symbol 
        RANGE INTERVAL '10' SECOND PRECEDING);
```



**To test the query**

1. Set up an application by following the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html).

1. Replace the `SELECT` statement in the application code with the preceding `SELECT` query. The resulting application code is the following.

   ```
   CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
                               ticker_symbol VARCHAR(10), 
                               Avg_Percent_Change double);
   CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
      INSERT INTO "DESTINATION_SQL_STREAM"
         SELECT STREAM Ticker_Symbol,
                       AVG(Change / (Price - Change)) over W1 as Avg_Percent_Change
         FROM "SOURCE_SQL_STREAM_001"
         WINDOW W1 AS (
                 PARTITION BY ticker_symbol 
                 RANGE INTERVAL '10' SECOND PRECEDING);
   ```

## Example 3: Query Data from Multiple Sliding Windows on the Same Stream
<a name="sliding-ex3"></a>

You can write queries to emit output in which each column value is calculated using different sliding windows defined over the same stream. 

In the following example, the query emits the output ticker, price, a2, and a10. It emits output for ticker symbols whose two-row moving average crosses the ten-row moving average. The `a2` and `a10` column values are derived from two-row and ten-row sliding windows.

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
                           ticker_symbol      VARCHAR(12), 
                           price              double, 
                           average_last2rows  double, 
                           average_last10rows double);

CREATE OR REPLACE PUMP "myPump" AS INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM ticker_symbol, 
              price, 
              avg(price) over last2rows, 
              avg(price) over last10rows
FROM SOURCE_SQL_STREAM_001
WINDOW
    last2rows AS (PARTITION BY ticker_symbol ROWS 2 PRECEDING),
    last10rows AS (PARTITION BY ticker_symbol ROWS 10 PRECEDING);
```

To test this query against the demo stream, follow the test procedure described in [Example 1](#sliding-ex1).

# Streaming Data Operations: Stream Joins
<a name="stream-joins-concepts"></a>

You can have multiple in-application streams in your application. You can write `JOIN` queries to correlate data arriving on these streams. For example, suppose that you have the following in-application streams:
+ **OrderStream** – Receives stock orders being placed.

  ```
  (orderId SqlType, ticker SqlType, amount SqlType, ROWTIME TimeStamp)
  ```
+ **TradeStream** – Receives resulting stock trades for those orders.

  ```
  (tradeId SqlType, orderId SqlType, ticker SqlType, amount SqlType, ticker SqlType, amount SqlType, ROWTIME TimeStamp)
  ```

The following are `JOIN` query examples that correlate data on these streams.

## Example 1: Report Orders Where There Are Trades Within One Minute of the Order Being Placed
<a name="join-ex1"></a>

In this example, your query joins both the `OrderStream` and `TradeStream`. However, because we want only trades placed one minute after the orders, the query defines the 1-minute window over the `TradeStream`. For information about windowed queries, see [Sliding Windows](sliding-window-concepts.md).

```
SELECT STREAM
     ROWTIME, 
     o.orderId, o.ticker, o.amount AS orderAmount,
     t.amount AS tradeAmount
FROM OrderStream AS o
JOIN TradeStream OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS t
ON   o.orderId = t.orderId;
```

You can define the windows explicitly using the `WINDOW` clause and writing the preceding query as follows: 

```
SELECT STREAM
    ROWTIME, 
    o.orderId, o.ticker, o.amount AS orderAmount,
    t.amount AS tradeAmount
FROM OrderStream AS o
JOIN TradeStream OVER t
ON o.orderId = t.orderId
WINDOW t AS
    (RANGE INTERVAL '1' MINUTE PRECEDING)
```

When you include this query in your application code, the application code runs continuously. For each arriving record on the `OrderStream`, the application emits an output if there are trades within the 1-minute window following the order being placed. 

The join in the preceding query is an inner join where the query emits records in `OrderStream` for which there is a matching record in `TradeStream` (and vice versa). Using an outer join you can create another interesting scenario. Suppose that you want stock orders for which there are no trades within one minute of stock order being placed, and trades reported within the same window but for some other orders. This is example of an *outer join*. 

```
SELECT STREAM
    ROWTIME, 
    o.orderId, o.ticker, o.amount AS orderAmount,
    t.ticker, t.tradeId, t.amount AS tradeAmount,
FROM OrderStream AS o
LEFT OUTER JOIN TradeStream OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS t
ON    o.orderId = t.orderId;
```