Prerequisites

Databricks workspace with Unity Catalog enabled
Metastore created and attached to your workspace
Appropriate admin privileges to create service principals and grant permissions

Integration Methods

Method 1: Direct Read Access via Service Principal

This method allows Aampe to directly access your Unity Catalog tables using a dedicated service principal with OAuth authentication.

Required Permissions

The Aampe service principal needs the following Unity Catalog privileges:

USE CATALOG - Access to view metadata in the catalog
USE SCHEMA - Access to view metadata in schemas
SELECT - Read access to tables and views
BROWSE - Optional: View object metadata in Catalog Explorer

Step-by-step Procedure

Create the Service Principal

Option A: Databricks-managed Service Principal

Go to your Databricks account console
Navigate to User management > Service principals
Click Add service principal
Provide a name (e.g., aampe-integration-sp)
Click Add

Option B: Azure Entra ID Service Principal (Azure Databricks only)

Create an app registration in Azure Portal
Note the Application (client) ID
Add the service principal to your Databricks account
Generate OAuth Credentials
In the service principal’s details page, go to the Secrets tab
Under OAuth secrets, click Generate secret
Set the secret’s lifetime (maximum 730 days)
Copy and securely store:
- Client ID (same as Application ID)
- OAuth Secret (shown only once)
Click Done
Grant Unity Catalog Permissions

Run the following SQL commands in a Databricks notebook or SQL editor: — Grant catalog-level permissions GRANT USE CATALOG ON CATALOG <catalog_name> TO <service_principal_id>; — Grant schema-level permissions GRANT USE SCHEMA ON SCHEMA <catalog_name>.<schema_name> TO <service_principal_id>; — Grant table-level read permissions GRANT SELECT ON TABLE <catalog_name>.<schema_name>.<table_name> TO <service_principal_id>; — Alternatively, grant SELECT on all tables in a schema GRANT SELECT ON SCHEMA <catalog_name>.<schema_name> TO <service_principal_id>; Replace:

<catalog_name>: Your Unity Catalog name
<schema_name>: Your schema/database name
<table_name>: Specific table names (or use schema-level grant)
<service_principal_id>: The Application ID of your service principal

Optional: Create Custom Role for Minimal Permissions

For more granular control, create a custom role: — Example: Grant read-only access to specific tables CREATE ROLE IF NOT EXISTS aampe_reader; — Grant the role necessary permissions GRANT USE CATALOG ON CATALOG <catalog_name> TO aampe_reader; GRANT USE SCHEMA ON SCHEMA <catalog_name>.<schema_name> TO aampe_reader; GRANT SELECT ON SCHEMA <catalog_name>.<schema_name> TO aampe_reader; — Assign the role to the service principal GRANT aampe_reader TO <service_principal_id>;

Provide Connection Details to Aampe

Share the following information with the Aampe team:

Workspace URL: https://<workspace-name>.cloud.databricks.com
Client ID: Your service principal’s Application ID
OAuth Secret: The generated secret (transfer securely)
Catalog Name: Your Unity Catalog name
Schema Names: List of schemas containing relevant tables
Table Names: Specific tables to sync (if not syncing entire schemas)
Workspace ID: Your Databricks workspace ID
Account ID: Your Databricks account ID (for account-level access)

Method 2: Push to Cloud Storage

For large-scale data transfers, you can export data to cloud storage that Aampe can access.

AWS S3 Setup

Create External Location

— Create storage credential CREATE STORAGE CREDENTIAL IF NOT EXISTS aampe_storage_cred WITH ( AWS_IAM_ROLE = ‘arn:aws:iam::<account-id>:role/<role-name>’ ); — Create external location CREATE EXTERNAL LOCATION IF NOT EXISTS aampe_export_location URL ‘s3://<bucket-name>/aampe-export/’ WITH (STORAGE CREDENTIAL aampe_storage_cred);

Grant Permissions

— Grant write access to the service principal GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION aampe_export_location TO <service_principal_id>; GRANT WRITE FILES ON EXTERNAL LOCATION aampe_export_location TO <service_principal_id>;

Set up Automated Export

Create a Databricks Job or Delta Live Tables pipeline to export data:

Example PySpark export job

from pyspark.sql import SparkSession def export_to_s3(): spark = SparkSession.builder.appName(“AampeExport”).getOrCreate() # Read from Unity Catalog table df = spark.read.table(“catalog.schema.table_name”) # Write to S3 in Parquet format df.write \ .mode(“overwrite”) \ .option(“compression”, “snappy”) \ .parquet(“s3://bucket-name/aampe-export/table_name/”) # Or write as Delta format df.write \ .format(“delta”) \ .mode(“overwrite”) \ .save(“s3://bucket-name/aampe-export/table_name_delta/“)

Azure Data Lake Storage (ADLS) Setup

Create External Location

— Create storage credential CREATE STORAGE CREDENTIAL IF NOT EXISTS aampe_storage_cred WITH ( AZURE_SERVICE_PRINCIPAL = ( TENANT_ID = ‘<tenant-id>’, CLIENT_ID = ‘<client-id>’, CLIENT_SECRET = ‘<client-secret>’ ) ); — Create external location CREATE EXTERNAL LOCATION IF NOT EXISTS aampe_export_location URL ‘abfss://<container>@<storage-account>.dfs.core.windows.net/aampe-export/’ WITH (STORAGE CREDENTIAL aampe_storage_cred);

Configure Export Pipeline

Similar to AWS, create automated jobs to export data to ADLS.

Data Export Best Practices

Incremental Updates

For ongoing synchronization, implement incremental exports:

Export only new or modified records

def incremental_export(last_sync_timestamp): df = spark.read.table(“catalog.schema.table_name”) \ .filter(f”modified_at > ‘last_sync_timestamp’”) df.write \ .mode(“append”) \ .partitionBy(“date”) \ .parquet(“s3://bucket-name/aampe-export/incremental/“)

Partitioning Strategy

Organize exports by date for efficient data retrieval: df.write \ .partitionBy(“year”, “month”, “day”) \ .mode(“overwrite”) \ .parquet(“s3://bucket-name/aampe-export/partitioned/“)

Security Considerations

Token Rotation: OAuth secrets have a maximum lifetime of 730 days. Plan for regular rotation.
Principle of Least Privilege: Grant only the minimum required permissions:

Use schema-level grants instead of catalog-level when possible
Consider read-only external locations for exports
Restrict access to specific workspaces if using workspace-level tokens

Audit Logging: Unity Catalog automatically logs all access. Monitor these logs:

SELECT * FROM system.access.audit WHERE service_name = ‘unityCatalog’ AND user_identity = ‘<service_principal_id>’ ORDER BY event_time DESC LIMIT 100;

Network Security:

Configure IP access lists if required
Use Private Link for enhanced security (Azure)
Enable workspace access control

Monitoring and Troubleshooting

Verify Service Principal Access

Test the service principal’s access using Databricks CLI:

Configure Databricks CLI with OAuth

databricks configure —profile aampe-sp

Test connection

databricks tables list —catalog <catalog_name> —schema <schema_name>

Common Issues

Permission Denied Errors

Verify all three levels of permissions: CATALOG → SCHEMA → TABLE
Check that the service principal is assigned to the workspace
Ensure OAuth secret hasn’t expired

Token Expiration

OAuth tokens expire after 1 hour (automatically refreshed by Aampe)
OAuth secrets expire based on configured lifetime (max 730 days)

External Location Access

Verify storage credential has correct cloud IAM permissions
Check external location URL format
Ensure proper CORS configuration for cross-region access

API Access Patterns

Aampe will use the following authentication flow:

Token Generation (handled automatically):

POST https://<databricks-instance>/oidc/v1/token Content-Type: application/x-www-form-urlencoded grant_type=client_credentials &client_id=<client-id> &client_secret=<oauth-secret> &scope=all-apis

Data Access via SQL Warehouse or Cluster:

Connect using JDBC/ODBC with OAuth token
Execute SELECT queries on authorized tables
Respect rate limits and query timeouts

Data Models

Ensure your exported data conforms to the Aampe Data Models specification.

SSO

Send Data to Aampe

Connect Message Provider

CMS Integration

Embed Aampe in an App or Website

Receive Data from Aampe

Databricks

Prerequisites

Integration Methods

Method 1: Direct Read Access via Service Principal

Required Permissions

Step-by-step Procedure

Method 2: Push to Cloud Storage

AWS S3 Setup

Example PySpark export job

Azure Data Lake Storage (ADLS) Setup

Data Export Best Practices

Incremental Updates

Export only new or modified records

Partitioning Strategy

Security Considerations

Monitoring and Troubleshooting

Verify Service Principal Access

Configure Databricks CLI with OAuth

Test connection

Common Issues

API Access Patterns

Data Models

SSO

Send Data to Aampe

Connect Message Provider

CMS Integration

Embed Aampe in an App or Website

Receive Data from Aampe

​Prerequisites

​Integration Methods

​Method 1: Direct Read Access via Service Principal

​Required Permissions

​Step-by-step Procedure

​Method 2: Push to Cloud Storage

​AWS S3 Setup

​Example PySpark export job

​Azure Data Lake Storage (ADLS) Setup

​Data Export Best Practices

​Incremental Updates

​Export only new or modified records

​Partitioning Strategy

​Security Considerations

​Monitoring and Troubleshooting

​Verify Service Principal Access

​Configure Databricks CLI with OAuth

​Test connection

​Common Issues

​API Access Patterns

​Related Resources

​Data Models

Prerequisites

Integration Methods

Method 1: Direct Read Access via Service Principal

Required Permissions

Step-by-step Procedure

Method 2: Push to Cloud Storage

AWS S3 Setup

Example PySpark export job

Azure Data Lake Storage (ADLS) Setup

Data Export Best Practices

Incremental Updates

Export only new or modified records

Partitioning Strategy

Security Considerations

Monitoring and Troubleshooting

Verify Service Principal Access

Configure Databricks CLI with OAuth

Test connection

Common Issues

API Access Patterns

Related Resources

Data Models