Prerequisites

  • Databricks workspace with Unity Catalog enabled
  • Metastore created and attached to your workspace
  • Appropriate admin privileges to create service principals and grant permissions

Integration Methods

Method 1: Direct Read Access via Service Principal

This method allows Aampe to directly access your Unity Catalog tables using a dedicated service principal with OAuth authentication.

Required Permissions

The Aampe service principal needs the following Unity Catalog privileges:
  • USE CATALOG - Access to view metadata in the catalog
  • USE SCHEMA - Access to view metadata in schemas
  • SELECT - Read access to tables and views
  • BROWSE - Optional: View object metadata in Catalog Explorer

Step-by-step Procedure

  1. Create the Service Principal
Option A: Databricks-managed Service Principal
  1. Go to your Databricks account console
  2. Navigate to User management > Service principals
  3. Click Add service principal
  4. Provide a name (e.g., aampe-integration-sp)
  5. Click Add
Option B: Azure Entra ID Service Principal (Azure Databricks only)
  1. Create an app registration in Azure Portal
  2. Note the Application (client) ID
  3. Add the service principal to your Databricks account
  4. Generate OAuth Credentials
  5. In the service principal’s details page, go to the Secrets tab
  6. Under OAuth secrets, click Generate secret
  7. Set the secret’s lifetime (maximum 730 days)
  8. Copy and securely store:
    • Client ID (same as Application ID)
    • OAuth Secret (shown only once)
  9. Click Done
  10. Grant Unity Catalog Permissions
Run the following SQL commands in a Databricks notebook or SQL editor: — Grant catalog-level permissions GRANT USE CATALOG ON CATALOG <catalog_name> TO <service_principal_id>; — Grant schema-level permissions GRANT USE SCHEMA ON SCHEMA <catalog_name>.<schema_name> TO <service_principal_id>; — Grant table-level read permissions GRANT SELECT ON TABLE <catalog_name>.<schema_name>.<table_name> TO <service_principal_id>; — Alternatively, grant SELECT on all tables in a schema GRANT SELECT ON SCHEMA <catalog_name>.<schema_name> TO <service_principal_id>; Replace:
  • <catalog_name>: Your Unity Catalog name
  • <schema_name>: Your schema/database name
  • <table_name>: Specific table names (or use schema-level grant)
  • <service_principal_id>: The Application ID of your service principal
  1. Optional: Create Custom Role for Minimal Permissions
For more granular control, create a custom role: — Example: Grant read-only access to specific tables CREATE ROLE IF NOT EXISTS aampe_reader; — Grant the role necessary permissions GRANT USE CATALOG ON CATALOG <catalog_name> TO aampe_reader; GRANT USE SCHEMA ON SCHEMA <catalog_name>.<schema_name> TO aampe_reader; GRANT SELECT ON SCHEMA <catalog_name>.<schema_name> TO aampe_reader; — Assign the role to the service principal GRANT aampe_reader TO <service_principal_id>;
  1. Provide Connection Details to Aampe
Share the following information with the Aampe team:
  • Workspace URL: https://<workspace-name>.cloud.databricks.com
  • Client ID: Your service principal’s Application ID
  • OAuth Secret: The generated secret (transfer securely)
  • Catalog Name: Your Unity Catalog name
  • Schema Names: List of schemas containing relevant tables
  • Table Names: Specific tables to sync (if not syncing entire schemas)
  • Workspace ID: Your Databricks workspace ID
  • Account ID: Your Databricks account ID (for account-level access)

Method 2: Push to Cloud Storage

For large-scale data transfers, you can export data to cloud storage that Aampe can access.

AWS S3 Setup

  1. Create External Location
— Create storage credential CREATE STORAGE CREDENTIAL IF NOT EXISTS aampe_storage_cred WITH (   AWS_IAM_ROLE = ‘arn:aws:iam::<account-id>:role/<role-name>’ ); — Create external location CREATE EXTERNAL LOCATION IF NOT EXISTS aampe_export_location URL ‘s3://<bucket-name>/aampe-export/’ WITH (STORAGE CREDENTIAL aampe_storage_cred);
  1. Grant Permissions
— Grant write access to the service principal GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION aampe_export_location  TO <service_principal_id>; GRANT WRITE FILES ON EXTERNAL LOCATION aampe_export_location  TO <service_principal_id>;
  1. Set up Automated Export
Create a Databricks Job or Delta Live Tables pipeline to export data:

Example PySpark export job

from pyspark.sql import SparkSession def export_to_s3():     spark = SparkSession.builder.appName(“AampeExport”).getOrCreate()          # Read from Unity Catalog table     df = spark.read.table(“catalog.schema.table_name”)          # Write to S3 in Parquet format     df.write \       .mode(“overwrite”) \       .option(“compression”, “snappy”) \       .parquet(“s3://bucket-name/aampe-export/table_name/”)          # Or write as Delta format     df.write \       .format(“delta”) \       .mode(“overwrite”) \       .save(“s3://bucket-name/aampe-export/table_name_delta/“)

Azure Data Lake Storage (ADLS) Setup

  1. Create External Location
— Create storage credential CREATE STORAGE CREDENTIAL IF NOT EXISTS aampe_storage_cred WITH (   AZURE_SERVICE_PRINCIPAL = (     TENANT_ID = ‘<tenant-id>’,     CLIENT_ID = ‘<client-id>’,     CLIENT_SECRET = ‘<client-secret>’   ) ); — Create external location CREATE EXTERNAL LOCATION IF NOT EXISTS aampe_export_location URL ‘abfss://<container>@<storage-account>.dfs.core.windows.net/aampe-export/’ WITH (STORAGE CREDENTIAL aampe_storage_cred);
  1. Configure Export Pipeline
Similar to AWS, create automated jobs to export data to ADLS.

Data Export Best Practices

Incremental Updates

For ongoing synchronization, implement incremental exports:

Export only new or modified records

def incremental_export(last_sync_timestamp):     df = spark.read.table(“catalog.schema.table_name”) \         .filter(f”modified_at > ‘last_sync_timestamp’”)          df.write \       .mode(“append”) \       .partitionBy(“date”) \       .parquet(“s3://bucket-name/aampe-export/incremental/“)

Partitioning Strategy

Organize exports by date for efficient data retrieval: df.write \   .partitionBy(“year”, “month”, “day”) \   .mode(“overwrite”) \   .parquet(“s3://bucket-name/aampe-export/partitioned/“)

Security Considerations

  1. Token Rotation: OAuth secrets have a maximum lifetime of 730 days. Plan for regular rotation.
  2. Principle of Least Privilege: Grant only the minimum required permissions:
  • Use schema-level grants instead of catalog-level when possible
  • Consider read-only external locations for exports
  • Restrict access to specific workspaces if using workspace-level tokens
  1. Audit Logging: Unity Catalog automatically logs all access. Monitor these logs:
SELECT * FROM system.access.audit WHERE service_name = ‘unityCatalog’ AND user_identity = ‘<service_principal_id>’ ORDER BY event_time DESC LIMIT 100;
  1. Network Security:
  • Configure IP access lists if required
  • Use Private Link for enhanced security (Azure)
  • Enable workspace access control

Monitoring and Troubleshooting

Verify Service Principal Access

Test the service principal’s access using Databricks CLI:

Configure Databricks CLI with OAuth

databricks configure —profile aampe-sp

Test connection

databricks tables list —catalog <catalog_name> —schema <schema_name>

Common Issues

  1. Permission Denied Errors
  • Verify all three levels of permissions: CATALOG → SCHEMA → TABLE
  • Check that the service principal is assigned to the workspace
  • Ensure OAuth secret hasn’t expired
  1. Token Expiration
  • OAuth tokens expire after 1 hour (automatically refreshed by Aampe)
  • OAuth secrets expire based on configured lifetime (max 730 days)
  1. External Location Access
  • Verify storage credential has correct cloud IAM permissions
  • Check external location URL format
  • Ensure proper CORS configuration for cross-region access

API Access Patterns

Aampe will use the following authentication flow:
  1. Token Generation (handled automatically):
POST https://<databricks-instance>/oidc/v1/token Content-Type: application/x-www-form-urlencoded grant_type=client_credentials &client_id=<client-id> &client_secret=<oauth-secret> &scope=all-apis
  1. Data Access via SQL Warehouse or Cluster:
  • Connect using JDBC/ODBC with OAuth token
  • Execute SELECT queries on authorized tables
  • Respect rate limits and query timeouts

Data Models

Ensure your exported data conforms to the Aampe Data Models specification.