Redshift Provisioned Cluster - Create Firehose Stream
-
Open the Amazon Kinesis Firehose console.
-
Click Create Firehose Stream.
-
Under Source, select Direct PUT.
-
Under Destination, select Amazon Redshift.
-
Change the default stream name (this name is required in MetaRouter configuration).
-
Select the Redshift Provisioned Cluster you created earlier.
- If the cluster is still being created, check its status in the Redshift Dashboard before proceeding.
-
Enter the Redshift database name you created earlier.
-
Configure authentication for Redshift.
-
Enter a staging table name (e.g.
staging_firehose
, data will be loaded here before being distributed to target tables). -
Select the S3 bucket you created earlier as the intermediary storage.
-
Configure COPY command options as follows:
json 'auto ignorecase'
-
Keep the buffer configuration at default (adjust as needed). Further details can be found after step 17, within "Optional Step - Adjusting Firehose Buffer Settings".
-
Click Create Firehose Stream.
-
Open the Configuration tab.
-
Scroll down and click Edit for Service Access.
-
Under IAM Role, select Choose Existing IAM Role and attach the IAM role you created earlier (same role used for Redshift).
-
Click Save Changes.
Optional Step - Adjusting Firehose Buffer Settings
- At the bottom of the Firehose settings page, there is a dropdown menu labeled "Buffer hints, compression, and encryption". In this section, you can adjust additional settings to customize Firehose’s buffer, which determines how quickly events appear in your Redshift tables.
- Firehose offers two buffering strategies:
- Buffering until a specified payload size is reached.
- Buffering until a set amount of time has passed.
Once one of these limits is met, Firehose moves the data to S3 and Redshift.
- The default (Amazon-recommended) buffer settings are 5 MiB or 300 seconds.
- The minimum buffer interval can be set to 0 seconds, allowing near real-time data transfers. However, since Firehose processes data using 5 KiB PUT tokens for pricing, buffering in increments of 5 KiB payloads can help reduce operational costs.
- Lowering buffer limits during the testing phase can help expedite data validation and troubleshooting.
Updated 15 days ago