When importing from databases, many customers create separate tables/views for Aampe that are limited to only the data necessary for Aampe to function.
Connect Snowflake to Aampe (via Google Cloud Storage)
1. Create Snowflake Integration
Integrations are named, first-class Snowflake objects that avoid the need for passing explicit cloud provider credentials such as secret keys or access tokens; instead, integration objects reference a Cloud Storage service account. Aampe will provide you with the bucket link mentioned below.
CREATE STORAGE INTEGRATION gcs_int
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = GCS
ENABLED = TRUE
STORAGE_ALLOWED_LOCATIONS = ('gcs-bucket-link');
2. Obtain GCP Service Account
A GCP Service account is created by Snowflake as part of Step 1:
DESC STORAGE INTEGRATION gcs_int;
In the output of the above command, the STORAGE_GCP_SERVICE_ACCOUNT property contains the GCP service account name. Example output:
+-----------------------------+---------------+-----------------------------------------------------------------------------+------------------+
| property | property_type | property_value | property_default |
+-----------------------------+---------------+-----------------------------------------------------------------------------+------------------|
| ENABLED | Boolean | true | false |
| STORAGE_ALLOWED_LOCATIONS | List | bucket-link-that-we-will-send-you | [] |
| STORAGE_GCP_SERVICE_ACCOUNT | String | [email protected] | |
+-----------------------------+---------------+-----------------------------------------------------------------------------+------------------+
We will take this service account and provide it permissions to add/modify data to the GCS bucket.
A parquet file format can be created in Snowflake as follows to unload snowflake data in the parquet format. Any other format like CSV or JSON also can be created. Here is an example script to create parquet format:
USE DB_NAME;
USE SCHEMA_NAME;
CREATE OR REPLACE FILE FORMAT parquet_unload_file_format
TYPE = PARQUET
SNAPPY_COMPRESSION = TRUE
COMMENT = 'FILE FORMAT FOR UNLOADING AS PARQUET FILES';
More options and file types can be found here.
4. Create External Stage
In Snowflake, an EXTERNAL STAGE object references data files stored in a location outside of Snowflake.
USE DB_NAME;
USE SCHEMA SCHEMA_NAME;
CREATE OR REPLACE STAGE parquet_unload_gcs_stage
URL = 'gcs-bucket-link'
storage_integration = gcs_int
FILE_FORMAT = parquet_unload_file_format
COMMENT = 'GCS Stage for the Snowflake external Parquet export';
5. Unload Data
From Snowflake, use the COPY command in Snowflake to unload data from a Snowflake table into a GCS bucket. To retain the column names in the output file, use the HEADER = TRUE copy option.
COPY INTO @<EXTERNAL_STAGE_NAME>/<file_name_prefix>
FROM DB_NAME.SCHEMA_NAME.TABLE_NAME
HEADER = TRUE;
For example:
COPY INTO @parquet_unload_gcs_stage/region
FROM SF01.TEST01.REGION
HEADER=TRUE;
For the minimal fields we need, you can refer to Data Models.
There are various parameters for COPY command which can limit the partitions and provide optimizations: https://docs.snowflake.com/en/sql-reference/sql/copy-into-location.html
6. Automate the Unloading
What we recommend is to automate the COPY command that unloads each day’s events (single partition) into the external stage by using Snowflake TASK for it to run at a schedule: https://docs.snowflake.com/en/sql-reference/sql/create-task.html
An example of same would be:
CREATE TASK copy_to_gcs
SCHEDULE = 'USING CRON 0 10 * * * Europe/Rome'
COMMENT = 'Test Schedule' AS
COPY INTO @parquet_unload_gcs_stage/region
from SF01.TEST01.REGION
HEADER=TRUE;
If you use Airflow on your side for orchestration you can also use that to schedule this query at a regular interval.
Alternatively we can fetch directly from Snowflake
Snowflake Direct Access Setup
Create a dedicated Aampe user with read-only access to your data warehouse. Once configured, we’ll handle the ETL scheduling on our side.
Step 1: Switch to Account Admin
Run this in your Snowflake worksheet:
Step 2: Create Role and User
Create a read-only role for Aampe (skip if you have an existing read-only role you’d prefer to use):
CREATE ROLE AAMPE_ROLE;
CREATE USER AAMPE_DEVELOPER
PASSWORD = 'aampe_developer'
EMAIL = '[email protected]'
MUST_CHANGE_PASSWORD = FALSE
DEFAULT_ROLE = AAMPE_ROLE
DEFAULT_WAREHOUSE = <YOUR_WAREHOUSE>;
GRANT ROLE AAMPE_ROLE TO USER AAMPE_DEVELOPER;
Step 3: Grant Read Access
Grant the role access to your warehouse, database, and tables. Adjust the scope as needed:
USE ROLE ACCOUNTADMIN;
-- Current objects
GRANT USAGE ON WAREHOUSE <MY_WH> TO ROLE AAMPE_ROLE;
GRANT USAGE ON DATABASE <MY_DB> TO ROLE AAMPE_ROLE;
GRANT USAGE ON ALL SCHEMAS IN DATABASE <MY_DB> TO ROLE AAMPE_ROLE;
GRANT SELECT ON ALL TABLES IN DATABASE <MY_DB> TO ROLE AAMPE_ROLE;
GRANT SELECT ON ALL VIEWS IN DATABASE <MY_DB> TO ROLE AAMPE_ROLE;
-- Future objects (so new tables are automatically accessible)
GRANT USAGE ON FUTURE SCHEMAS IN DATABASE <MY_DB> TO ROLE AAMPE_ROLE;
GRANT SELECT ON FUTURE TABLES IN DATABASE <MY_DB> TO ROLE AAMPE_ROLE;
GRANT SELECT ON FUTURE VIEWS IN DATABASE <MY_DB> TO ROLE AAMPE_ROLE;
Step 4: Share Connection Details
Send us the following:
- Account (from your URL, e.g.,
tzviibq-hr75240 from https://tzviibq-hr75240.snowflakecomputing.com)
- Warehouse Name
- Database Name
- Schema Name
- Table Names (tables containing your event data)
Once we receive these, we’ll configure the connection and begin syncing data at regular intervals.
References: