Redshift Integration
There are two recommended approaches for integrating Amazon Redshift with Aampe:- Unload data into Amazon S3 from Redshift
- Provide direct access for Aampe user to your Redshift cluster (and tables) with event data
Unloading Data into Amazon S3 from Redshift
1. Create IAM Role for Redshift
Create an IAM role that allows Redshift to access S3. This follows AWS security best practices by using roles instead of hardcoded credentials.your-bucket-name
with the actual bucket):
2. Attach IAM Role to Redshift Cluster
Associate the IAM role with your Redshift cluster:3. Unload Data to S3
Use the UNLOAD command to export data from Redshift to S3. Redshift’s UNLOAD command is optimized for parallel processing and handles large datasets efficiently.FORMAT AS PARQUET
: Efficient columnar format, recommended for analyticsPARTITION BY
: Organizes data for better query performancePARALLEL ON
: Enables parallel processing (default, but explicit is better)ALLOWOVERWRITE
: Overwrites existing filesMAXFILESIZE
: Controls individual file size (default 6.2GB)MANIFEST
: Generates manifest file for data verification
4. Automate the Unloading Process
Option A: Using Redshift Scheduler (Recommended) Create a scheduled query to automate daily unloads:5. Monitoring and Optimization
Monitor UNLOAD performance:- Use appropriate distribution keys on source tables
- Consider VACUUM and ANALYZE operations before UNLOAD
- Monitor S3 costs and implement lifecycle policies
- Use compression (GZIP, BZIP2) for cost optimization if network transfer is a concern
Provide Direct Access for Aampe User to Your Redshift Cluster
This approach allows Aampe to directly query your Redshift cluster. Follow the principle of least privilege for security.Setup Steps:
1. Create a Custom Database Role (if needed)
2. Create Database User
3. Grant Role to User
4. Grant Table-Level Permissions
5. Configure Network Security
VPC Security Groups: Ensure your Redshift cluster’s security group allows connections from Aampe’s IP addresses on port 5439. Subnet Groups: If using private subnets, ensure proper routing and NAT gateway configuration.Connection Details Required
Provide Aampe with the following connection information:- Host/Endpoint:
your-cluster-name.abc123.us-east-1.redshift.amazonaws.com
- Port:
5439
(default Redshift port) - Database Name: Your database name
- Schema Name: Schema containing event tables
- Username:
aampe_user
- Password: The password you created
- Table Names: List of tables containing event data