Databricks
Stream data from your Monad pipeline into Databricks Delta Lake tables via Unity Catalog, with automatic table creation, schema inference, and gzip-compressed staging.
Overview
The Databricks output enables you to:
- Load data into Delta Lake tables with automatic schema inference
- Automatically create tables on first data load if they don't exist
- Stage compressed JSONL files via Unity Catalog Volumes for efficient bulk loading
- Authenticate using OAuth M2M (service principal)
- Handle schema evolution with
mergeSchemasupport - Validate permissions during connection testing before any data is written
Requirements
- Databricks Workspace with Unity Catalog enabled
- SQL Warehouse running and accessible
- Catalog and Schema must already exist in your workspace
- Volume for staging files - Monad will create it if it doesn't exist
- Authentication credentials (see Authentication Methods)
Setting Up Permissions
The Databricks output requires the following permissions on the target schema. Grant them to your user or service principal:
-- Replace placeholders with your actual values
GRANT USE CATALOG ON CATALOG <your_catalog> TO `<principal>`;
GRANT USE SCHEMA ON SCHEMA <your_catalog>.<your_schema> TO `<principal>`;
GRANT CREATE TABLE ON SCHEMA <your_catalog>.<your_schema> TO `<principal>`;
GRANT SELECT ON SCHEMA <your_catalog>.<your_schema> TO `<principal>`;
GRANT MODIFY ON SCHEMA <your_catalog>.<your_schema> TO `<principal>`;
GRANT READ VOLUME, WRITE VOLUME ON VOLUME <your_catalog>.<your_schema>.<your_volume> TO `<principal>`;
GRANT CREATE VOLUME ON SCHEMA <your_catalog>.<your_schema> TO `<principal>`;
Where <principal> is: Your service principal application ID
Monad verifies all of these permissions during Test Connection and will report any missing grants.
Configuration
Settings
| Setting | Type | Required | Default | Description |
|---|---|---|---|---|
| Server Hostname | string | Yes | - | The Databricks workspace hostname (e.g. adb-1234567890.azuredatabricks.net) |
| HTTP Path | string | Yes | - | The SQL warehouse HTTP path from connection details (e.g. /sql/1.0/warehouses/abc123) |
| Catalog | string | Yes | - | The Unity Catalog name |
| Schema | string | Yes | - | The target schema within the catalog |
| Table | string | Yes | - | The target Delta table name. Created automatically if it doesn't exist |
| Volume | string | Yes | - | The Unity Catalog Volume used for staging JSONL files before loading |
| Batch Config | object | No | See below | Batching configuration |
Batch Configuration
Defaults are tuned for bulk COPY INTO throughput -- larger batches mean fewer load operations.
| Setting | Default | Min | Max | Description |
|---|---|---|---|---|
record_count | 50,000 | 500 | 100,000 | Maximum records per batch |
data_size | 10 MB | 1 MB | 50 MB | Maximum batch size |
publish_rate | 30s | 5s | 120s | Maximum time before sending a batch |
Secrets
| Setting | Type | Required | Description |
|---|---|---|---|
| Client ID | string | Yes* | OAuth M2M client ID for service principal authentication |
| Client Secret | string | Yes* | OAuth M2M client secret for service principal authentication |
Generate Client ID and Client Secret (OAuth Machine-to-Machine - Service Principal)
Recommended for production. Uses a service principal with client credentials:
- In the Databricks Account Console, go to User management > Service principals
- Click Add service principal and create one
- Select the service principal, go to Secrets > Generate secret
- Copy the Client ID and Client Secret
- Add the service principal to your workspace and grant it the required permissions
Use the client ID and client secret as the client_id and client_secret secrets.
Where to Find Connection Details
- In your Databricks workspace, go to SQL Warehouses
- Select your warehouse and open the Connection details tab
- Copy the Server hostname and HTTP path
Troubleshooting
Connection Issues
- Server hostname: Ensure the hostname is correct and accessible (e.g.
adb-1234567890.azuredatabricks.net) - HTTP path: Verify the SQL warehouse HTTP path from the connection details tab
- SQL warehouse: Make sure your warehouse is running -- Monad cannot start a stopped warehouse
Authentication Errors
- 401 Unauthorized: Check that your OAuth credentials are valid and not expired
- OAuth M2M: Ensure the service principal is added to the workspace and has the correct grants
Permission Errors
- USE SCHEMA denied: Grant
USE SCHEMAon the target schema to your principal - Missing permissions: Run Test Connection to see which specific permissions are missing
- Volume access denied: Grant
READ VOLUMEandWRITE VOLUMEon the volume
Data Loading Issues
- COPY INTO failures: Check that the volume exists and is accessible
- Schema mismatch:
mergeSchemais enabled, so new fields are added automatically. However, incompatible type changes may cause errors - Large batch failures: If uploads fail with 413 errors, reduce the
data_sizein batch configuration
Limitations
- Catalog and schema must exist before configuring the output
- Volume is created automatically if it doesn't exist
- Table schema is inferred from the data -- explicit schema definition is not supported, but you can pre-create the table with your desired schema
Best Practices
- Use default batch settings -- they are optimized for
COPY INTOthroughput - Share volumes across connectors -- multiple tables can safely stage files in the same volume
- Pre-create catalog and schema -- Monad expects these to exist
- Use dedicated service principals with only the required permissions
- Monitor warehouse usage -- each
COPY INTOconsumes SQL warehouse compute