Create Redshift Tables

Create Staging Table in Redshift

  • Important Notes:

    • Firehose can only send data to a single Redshift table.
    • The provided table creation commands are based on the default, non-customized playbook.
    • If you have customized the example playbook, you must modify the table structure to match your implementation.
    • Non-staging tables should be configured based on your specific requirements, as their setup is entirely under your control.
    • The provided schema is an example used for testing and should be adjusted as needed for production use.

    • Steps to Create the Staging Table
    1. Open the Amazon Redshift Serverless console.

    2. Click on your workgroup.

    3. Open the Query Editor.

    4. Run a SQL query to create a staging table (for users + events). The following query is an example and must be modified to match your use case:

      CREATE TABLE staging_firehose (
          anonymousId VARCHAR(MAX),
          context VARCHAR(MAX),
          integrations VARCHAR(MAX),
          messageId VARCHAR(MAX),
          originalTimestamp VARCHAR(MAX),
          properties VARCHAR(MAX),
          receivedAt VARCHAR(MAX),
          sentAt VARCHAR(MAX),
          timestamp VARCHAR(MAX),
          traits VARCHAR(MAX),
          type VARCHAR(MAX),
          event VARCHAR(MAX),
          userId VARCHAR(MAX),
          writeKey VARCHAR(MAX)
      );
      
    5. Stay in the Query Editor for next steps.


Create the Events Table in Redshift

  1. Open the Query Editor in Redshift.

  2. Run a SQL query to create an events table. The following query is an example and must be modified to match your use case:

    CREATE TABLE events (
        anonymousId VARCHAR(MAX),
        context VARCHAR(MAX),
        integrations VARCHAR(MAX),
        messageId VARCHAR(MAX),
        originalTimestamp VARCHAR(MAX),
        properties VARCHAR(MAX),
        receivedAt VARCHAR(MAX),
        sentAt VARCHAR(MAX),
        timestamp VARCHAR(MAX),
        type VARCHAR(MAX),
        event VARCHAR(MAX),
        userId VARCHAR(MAX),
        writeKey VARCHAR(MAX)
    );
    
  3. Stay in the Query Editor for next steps.


Create the Users Table in Redshift

  1. Open the Query Editor in Redshift.

  2. Run a SQL query to create a users table. The following query is an example and must be modified to match your use case:

    CREATE TABLE users (
        anonymousId VARCHAR(MAX),
        context VARCHAR(MAX),
        integrations VARCHAR(MAX),
        messageId VARCHAR(MAX),
        originalTimestamp VARCHAR(MAX),
        receivedAt VARCHAR(MAX),
        sentAt VARCHAR(MAX),
        timestamp VARCHAR(MAX),
        traits VARCHAR(MAX),
        type VARCHAR(MAX),
        event VARCHAR(MAX),
        userId VARCHAR(MAX),
        writeKey VARCHAR(MAX)
    );
    
  3. Stay in the Query Editor for next steps.


  • Grant Table Read/Write Access

    1. To enable the Scheduled Query feature in Redshift (which will automatically insert data from the staging table into the final tables), you must grant the Redshift workgroup role access to the newly created tables.

    2. Run the following SQL query, replacing <ROLE_NAME> with the IAM role you configured earlier, and make sure that table names matches your table names. The ones below (events, firehose_staging and users) are our example tables. Please ensure it executes successfully before proceeding.

      GRANT INSERT, SELECT ON TABLE events TO "IAMR:<ROLE_NAME>";
      GRANT INSERT, SELECT ON TABLE firehose_staging TO "IAMR:<ROLE_NAME>";
      GRANT INSERT, SELECT ON TABLE users TO "IAMR:<ROLE_NAME>";
      GRANT DELETE ON TABLE firehose_staging TO "IAMR:<ROLE_NAME>";
      
    3. Stay in the Query Editor for next steps.


Set up a Scheduled Query

  1. Paste one of the following queries into the Query Editor in Redshift.

    • The first query is a basic example meant for initial testing.
    • The second query is a more advanced version that ensures users' information is updated dynamically.
    • Important: These are examples and should be customized to meet your actual data requirements.
    -- This is pure example and won't satisfy real use cases after your testing please update this code. A more realistic use case version can be found in the second script.
    
    -- Query 1
    -- Insert data into table1
    INSERT INTO events (anonymousId, context, integrations, messageId, originalTimestamp, receivedAt, sentAt, "timestamp", type, event, userId, writeKey)
    SELECT anonymousId, context, integrations, messageId, originalTimestamp, receivedAt, sentAt, "timestamp", type, event, userId, writeKey
    FROM firehose_staging;
    
    -- Insert data into table2
    INSERT INTO users (anonymousId, userId)
    SELECT anonymousId, userId
    FROM firehose_staging;
    
    -- (Optional) Delete processed data from staging table
    DELETE FROM firehose_staging
    WHERE receivedAt <= SYSDATE;
    
    -- More intricate version of the script where user table will get updated whenever more information comes through for a specific anonymousID and for new users new records will be created but unlike previous script anonymousID will be unique.
    
    -- Query 2
    -- Insert new events into the events table
    INSERT INTO events (anonymousId, context, integrations, messageId, originalTimestamp, receivedAt, sentAt, "timestamp", type, event, userId, writeKey)
    SELECT anonymousId, context, integrations, messageId, originalTimestamp, receivedAt, sentAt, "timestamp", type, event, userId, writeKey
    FROM firehose_staging;
    
    -- Step 1: Update existing users with missing information
    UPDATE users
    SET 
        userId = COALESCE(users.userId, staging.userId),
        name = COALESCE(users.name, staging.name),
        age = COALESCE(users.age, staging.age)
    FROM firehose_staging AS staging
    WHERE users.anonymousId = staging.anonymousId;
    
    -- Step 2: Insert new users that do not already exist in the users table
    INSERT INTO users (anonymousId, userId, name, age)
    SELECT staging.anonymousId, staging.userId, staging.name, staging.age
    FROM firehose_staging AS staging
    LEFT JOIN users ON users.anonymousId = staging.anonymousId
    WHERE users.anonymousId IS NULL;
    
    -- (Optional) Delete processed data from the staging table
    DELETE FROM firehose_staging
    WHERE receivedAt <= SYSDATE;
    
    
  2. Click the Schedule button in the top-right corner to configure the query to run periodically. This ensures data from the staging table is continuously moved to the appropriate tables.

  3. Select the IAM role that has access to the Redshift workgroup.

  4. Enter a name and description for the scheduled query.

  5. Configure any additional settings based on your requirements.

  6. Click Schedule Query.


What’s Next