Redshift Provisioned Cluster - Create Firehose Stream

  1. Open the Amazon Kinesis Firehose console.

  2. Click Create Firehose Stream.

  3. Under Source, select Direct PUT.

  4. Under Destination, select Amazon Redshift.

  5. Change the default stream name (this name is required in MetaRouter configuration).

  6. Select the Redshift Provisioned Cluster you created earlier.

    • If the cluster is still being created, check its status in the Redshift Dashboard before proceeding.
  7. Enter the Redshift database name you created earlier.

  8. Configure authentication for Redshift.

  9. Enter a staging table name (e.g. staging_firehose, data will be loaded here before being distributed to target tables).

  10. Select the S3 bucket you created earlier as the intermediary storage.

  11. Configure COPY command options as follows:

    json 'auto ignorecase'
    
  12. Keep the buffer configuration at default (adjust as needed). Further details can be found after step 17, within "Optional Step - Adjusting Firehose Buffer Settings".

  13. Click Create Firehose Stream.

  14. Open the Configuration tab.

  15. Scroll down and click Edit for Service Access.

  16. Under IAM Role, select Choose Existing IAM Role and attach the IAM role you created earlier (same role used for Redshift).

  17. Click Save Changes.


Optional Step - Adjusting Firehose Buffer Settings

  • At the bottom of the Firehose settings page, there is a dropdown menu labeled "Buffer hints, compression, and encryption". In this section, you can adjust additional settings to customize Firehose’s buffer, which determines how quickly events appear in your Redshift tables.
  • Firehose offers two buffering strategies:
    • Buffering until a specified payload size is reached.
    • Buffering until a set amount of time has passed.
      Once one of these limits is met, Firehose moves the data to S3 and Redshift.
  • The default (Amazon-recommended) buffer settings are 5 MiB or 300 seconds.
  • The minimum buffer interval can be set to 0 seconds, allowing near real-time data transfers. However, since Firehose processes data using 5 KiB PUT tokens for pricing, buffering in increments of 5 KiB payloads can help reduce operational costs.
  • Lowering buffer limits during the testing phase can help expedite data validation and troubleshooting.

What’s Next