Prerequisites
- Databricks workspace with Unity Catalog enabled
- Metastore created and attached to your workspace
- Appropriate admin privileges to create service principals and grant permissions
Integration Methods
Method 1: Direct Read Access via Service Principal
This method allows Aampe to directly access your Unity Catalog tables using a dedicated service principal with OAuth authentication.Required Permissions
The Aampe service principal needs the following Unity Catalog privileges:- USE CATALOG - Access to view metadata in the catalog
- USE SCHEMA - Access to view metadata in schemas
- SELECT - Read access to tables and views
- BROWSE - Optional: View object metadata in Catalog Explorer
Step-by-step Procedure
- Create the Service Principal
- Go to your Databricks account console
- Navigate to User management > Service principals
- Click Add service principal
- Provide a name (e.g., aampe-integration-sp)
- Click Add
- Create an app registration in Azure Portal
- Note the Application (client) ID
- Add the service principal to your Databricks account
- Generate OAuth Credentials
- In the service principal’s details page, go to the Secrets tab
- Under OAuth secrets, click Generate secret
- Set the secret’s lifetime (maximum 730 days)
- Copy and securely store:
- Client ID (same as Application ID)
- OAuth Secret (shown only once)
- Click Done
- Grant Unity Catalog Permissions
<service_principal_id>
;
— Grant schema-level permissions
GRANT USE SCHEMA ON SCHEMA <catalog_name>.<schema_name> TO <service_principal_id>
;
— Grant table-level read permissions
GRANT SELECT ON TABLE <catalog_name>.<schema_name>.<table_name> TO <service_principal_id>
;
— Alternatively, grant SELECT on all tables in a schema
GRANT SELECT ON SCHEMA <catalog_name>.<schema_name> TO <service_principal_id>
;
Replace:
- <catalog_name>: Your Unity Catalog name
- <schema_name>: Your schema/database name
- <table_name>: Specific table names (or use schema-level grant)
- <service_principal_id>: The Application ID of your service principal
- Optional: Create Custom Role for Minimal Permissions
<service_principal_id>
;
- Provide Connection Details to Aampe
- Workspace URL: https://<workspace-name>.cloud.databricks.com
- Client ID: Your service principal’s Application ID
- OAuth Secret: The generated secret (transfer securely)
- Catalog Name: Your Unity Catalog name
- Schema Names: List of schemas containing relevant tables
- Table Names: Specific tables to sync (if not syncing entire schemas)
- Workspace ID: Your Databricks workspace ID
- Account ID: Your Databricks account ID (for account-level access)
Method 2: Push to Cloud Storage
For large-scale data transfers, you can export data to cloud storage that Aampe can access.AWS S3 Setup
- Create External Location
- Grant Permissions
<service_principal_id>
;
GRANT WRITE FILES ON EXTERNAL LOCATION aampe_export_location
TO <service_principal_id>
;
- Set up Automated Export
Example PySpark export job
from pyspark.sql import SparkSession def export_to_s3(): spark = SparkSession.builder.appName(“AampeExport”).getOrCreate() # Read from Unity Catalog table df = spark.read.table(“catalog.schema.table_name”) # Write to S3 in Parquet format df.write \ .mode(“overwrite”) \ .option(“compression”, “snappy”) \ .parquet(“s3://bucket-name/aampe-export/table_name/”) # Or write as Delta format df.write \ .format(“delta”) \ .mode(“overwrite”) \ .save(“s3://bucket-name/aampe-export/table_name_delta/“)Azure Data Lake Storage (ADLS) Setup
- Create External Location
- Configure Export Pipeline
Data Export Best Practices
Incremental Updates
For ongoing synchronization, implement incremental exports:Export only new or modified records
def incremental_export(last_sync_timestamp): df = spark.read.table(“catalog.schema.table_name”) \ .filter(f”modified_at > ‘last_sync_timestamp’”) df.write \ .mode(“append”) \ .partitionBy(“date”) \ .parquet(“s3://bucket-name/aampe-export/incremental/“)Partitioning Strategy
Organize exports by date for efficient data retrieval: df.write \ .partitionBy(“year”, “month”, “day”) \ .mode(“overwrite”) \ .parquet(“s3://bucket-name/aampe-export/partitioned/“)Security Considerations
- Token Rotation: OAuth secrets have a maximum lifetime of 730 days. Plan for regular rotation.
- Principle of Least Privilege: Grant only the minimum required permissions:
- Use schema-level grants instead of catalog-level when possible
- Consider read-only external locations for exports
- Restrict access to specific workspaces if using workspace-level tokens
- Audit Logging: Unity Catalog automatically logs all access. Monitor these logs:
- Network Security:
- Configure IP access lists if required
- Use Private Link for enhanced security (Azure)
- Enable workspace access control
Monitoring and Troubleshooting
Verify Service Principal Access
Test the service principal’s access using Databricks CLI:Configure Databricks CLI with OAuth
databricks configure —profile aampe-spTest connection
databricks tables list —catalog <catalog_name> —schema <schema_name>Common Issues
- Permission Denied Errors
- Verify all three levels of permissions: CATALOG → SCHEMA → TABLE
- Check that the service principal is assigned to the workspace
- Ensure OAuth secret hasn’t expired
- Token Expiration
- OAuth tokens expire after 1 hour (automatically refreshed by Aampe)
- OAuth secrets expire based on configured lifetime (max 730 days)
- External Location Access
- Verify storage credential has correct cloud IAM permissions
- Check external location URL format
- Ensure proper CORS configuration for cross-region access
API Access Patterns
Aampe will use the following authentication flow:- Token Generation (handled automatically):
- Data Access via SQL Warehouse or Cluster:
- Connect using JDBC/ODBC with OAuth token
- Execute SELECT queries on authorized tables
- Respect rate limits and query timeouts
Related Resources
- Unity Catalog Privileges Documentation
- Service Principal Authentication
- External Locations Guide
- Data Export Best Practices