Databricks

Stream data from your Monad pipeline into Databricks Delta Lake tables via Unity Catalog, with automatic table creation, schema inference, and gzip-compressed staging.

Overview

The Databricks output enables you to:

Load data into Delta Lake tables with automatic schema inference
Automatically create tables on first data load if they don't exist
Stage compressed JSONL files via Unity Catalog Volumes for efficient bulk loading
Authenticate using OAuth M2M (service principal)
Handle schema evolution with mergeSchema support
Validate permissions during connection testing before any data is written

Requirements

Databricks Workspace with Unity Catalog enabled
SQL Warehouse running and accessible
Catalog and Schema must already exist in your workspace
Volume for staging files - Monad will create it if it doesn't exist
Authentication credentials (see Authentication Methods)

Setting Up Permissions

The Databricks output requires the following permissions on the target schema. Grant them to your user or service principal:

-- Replace placeholders with your actual values
GRANT USE CATALOG ON CATALOG <your_catalog> TO `<principal>`;
GRANT USE SCHEMA ON SCHEMA <your_catalog>.<your_schema> TO `<principal>`;
GRANT CREATE TABLE ON SCHEMA <your_catalog>.<your_schema> TO `<principal>`;
GRANT SELECT ON SCHEMA <your_catalog>.<your_schema> TO `<principal>`;
GRANT MODIFY ON SCHEMA <your_catalog>.<your_schema> TO `<principal>`;
GRANT READ VOLUME, WRITE VOLUME ON VOLUME <your_catalog>.<your_schema>.<your_volume> TO `<principal>`;
GRANT CREATE VOLUME ON SCHEMA <your_catalog>.<your_schema> TO `<principal>`;

Where <principal> is: Your service principal application ID

Monad verifies all of these permissions during Test Connection and will report any missing grants.

Configuration

Settings

Setting	Type	Required	Default	Description
Server Hostname	string	Yes	-	The Databricks workspace hostname (e.g. `adb-1234567890.azuredatabricks.net`)
HTTP Path	string	Yes	-	The SQL warehouse HTTP path from connection details (e.g. `/sql/1.0/warehouses/abc123`)
Catalog	string	Yes	-	The Unity Catalog name
Schema	string	Yes	-	The target schema within the catalog
Table	string	Yes	-	The target Delta table name. Created automatically if it doesn't exist
Volume	string	Yes	-	The Unity Catalog Volume used for staging JSONL files before loading
Batch Config	object	No	See below	Batching configuration

Batch Configuration

Defaults are tuned for bulk COPY INTO throughput -- larger batches mean fewer load operations.

Setting	Default	Min	Max	Description
`record_count`	50,000	500	100,000	Maximum records per batch
`data_size`	10 MB	1 MB	50 MB	Maximum batch size
`publish_rate`	30s	5s	120s	Maximum time before sending a batch

Secrets

Setting	Type	Required	Description
Client ID	string	Yes*	OAuth M2M client ID for service principal authentication
Client Secret	string	Yes*	OAuth M2M client secret for service principal authentication

Generate Client ID and Client Secret (OAuth Machine-to-Machine - Service Principal)

Recommended for production. Uses a service principal with client credentials:

In the Databricks Account Console, go to User management > Service principals
Click Add service principal and create one
Select the service principal, go to Secrets > Generate secret
Copy the Client ID and Client Secret
Add the service principal to your workspace and grant it the required permissions

Use the client ID and client secret as the client_id and client_secret secrets.

Where to Find Connection Details

In your Databricks workspace, go to SQL Warehouses
Select your warehouse and open the Connection details tab
Copy the Server hostname and HTTP path

Troubleshooting

Connection Issues

Server hostname: Ensure the hostname is correct and accessible (e.g. adb-1234567890.azuredatabricks.net)
HTTP path: Verify the SQL warehouse HTTP path from the connection details tab
SQL warehouse: Make sure your warehouse is running -- Monad cannot start a stopped warehouse

Authentication Errors

401 Unauthorized: Check that your OAuth credentials are valid and not expired
OAuth M2M: Ensure the service principal is added to the workspace and has the correct grants

Permission Errors

USE SCHEMA denied: Grant USE SCHEMA on the target schema to your principal
Missing permissions: Run Test Connection to see which specific permissions are missing
Volume access denied: Grant READ VOLUME and WRITE VOLUME on the volume

Data Loading Issues

COPY INTO failures: Check that the volume exists and is accessible
Schema mismatch: mergeSchema is enabled, so new fields are added automatically. However, incompatible type changes may cause errors
Large batch failures: If uploads fail with 413 errors, reduce the data_size in batch configuration

Limitations

Catalog and schema must exist before configuring the output
Volume is created automatically if it doesn't exist
Table schema is inferred from the data -- explicit schema definition is not supported, but you can pre-create the table with your desired schema

Best Practices

Use default batch settings -- they are optimized for COPY INTO throughput
Share volumes across connectors -- multiple tables can safely stage files in the same volume
Pre-create catalog and schema -- Monad expects these to exist
Use dedicated service principals with only the required permissions
Monitor warehouse usage -- each COPY INTO consumes SQL warehouse compute

Overview​

Requirements​

Setting Up Permissions​

Configuration​

Settings​

Batch Configuration​

Secrets​

Generate Client ID and Client Secret (OAuth Machine-to-Machine - Service Principal)​

Where to Find Connection Details​

Troubleshooting​

Connection Issues​

Authentication Errors​

Permission Errors​

Data Loading Issues​

Limitations​

Best Practices​