Databricks Lakewatch
Stream data from your Monad pipeline into Databricks via the Lakewatch ingestion paths -- either staging files for Databricks Autoloader or pushing records directly through the ZeroBus streaming protocol.
Overview
The Databricks Lakewatch output supports two write modes:
- Autoloader -- Stages compressed JSONL files to a Unity Catalog Volume for Databricks Autoloader (
cloudFiles) to ingest. You configure the Autoloader job in Databricks to pick up files from the volume. - ZeroBus -- Sends records directly to the ZeroBus direct-write data-plane endpoint, bypassing file staging. Records land in the target Delta table with low end-to-end latency.
Both modes use OAuth M2M (service principal) authentication and validate access during connection testing.
Requirements
- Databricks Workspace with Unity Catalog enabled
- Catalog and Schema must already exist in your workspace
- Target Delta table must exist (ZeroBus mode) or an Autoloader job configured (Autoloader mode). See ZeroBus table requirements for storage constraints
- Volume for staging files (Autoloader mode only) -- Monad will create it if it doesn't exist
- Service principal with OAuth M2M credentials and the required permissions
Setting Up Permissions
Autoloader mode
Autoloader only needs volume access -- table writes happen inside your Autoloader job:
Code
ZeroBus mode
ZeroBus writes directly to the target table, so it needs table-level privileges:
Code
Where <principal> is your service principal application ID.
Configuration
Settings
| Setting | Type | Required | Default | Description |
|---|---|---|---|---|
| Server Hostname | string | Yes | - | Databricks workspace hostname (e.g. adb-1234567890.azuredatabricks.net) |
| Write Mode | object | Yes | autoloader | How data is loaded (see Write Modes) |
| Catalog | string | Yes | - | Unity Catalog name |
| Schema | string | Yes | - | Target schema within the catalog |
| Batch Config | object | No | See below | Batching configuration |
Write Modes
| Mode | Description |
|---|---|
autoloader | Stages JSONL files to a Volume for Databricks Autoloader (cloudFiles) to ingest |
zerobus | Streams records directly to the target Delta table via the ZeroBus direct-write API |
Autoloader requires:
| Setting | Type | Required | Description |
|---|---|---|---|
| Volume | string | Yes | Unity Catalog Volume used for staging JSONL files |
ZeroBus requires:
| Setting | Type | Required | Description |
|---|---|---|---|
| Workspace ID | string | Yes | Numeric Databricks workspace ID -- used to scope the ZeroBus OAuth token and form the data-plane endpoint |
| Region | string | Yes | Workspace region (e.g. us-west-2) -- used to form the ZeroBus data-plane endpoint |
| Table Name | string | Yes | Target table name within the configured catalog and schema (alphanumeric, underscore, and hyphen only) |
The ZeroBus data-plane endpoint is constructed as https://<workspace_id>.zerobus.<region>.cloud.databricks.com.
ZeroBus table requirements
ZeroBus does not support tables created against Unity Catalog's metastore-default (managed) storage. Attempting to write to such a table fails with error code 4024.
The target table must be a managed Delta table whose catalog or schema has an explicit managed location backed by your own cloud storage (S3 / ADLS / GCS) via a Unity Catalog external location and storage credential.
Batch Configuration
Batch caps are intentionally tighter than the standard Databricks Delta Table output because ZeroBus enforces strict per-request size limits on the data plane. See ZeroBus limits for the upstream constraints.
| Setting | Default | Min | Max | Description |
|---|---|---|---|---|
record_count | 50,000 | 5,000 | 50,000 | Maximum records per batch |
data_size | 10 MB | 5 MB | 10 MB | Maximum batch size |
publish_rate | 300s | 300s | 600s | Maximum time before sending a batch |
Secrets
| Setting | Type | Required | Description |
|---|---|---|---|
| Client ID | string | Yes | OAuth M2M client ID for service principal authentication |
| Client Secret | string | Yes | OAuth M2M client secret for service principal authentication |
Generate Client ID and Client Secret (OAuth Machine-to-Machine - Service Principal)
- In the Databricks Account Console, go to User management > Service principals
- Click Add service principal and create one
- Select the service principal, go to Secrets > Generate secret
- Copy the Client ID and Client Secret
- Add the service principal to your workspace and grant it the required permissions
Use the client ID and client secret as the client_id and client_secret secrets.
Where to Find Workspace Details
For the authoritative walkthrough -- including how to derive the ZeroBus endpoint from your workspace URL -- see Databricks's guide: Get your workspace URL and ZeroBus ingest endpoint.
- Server Hostname -- The host portion of your Databricks workspace URL (e.g.
adb-1234567890.azuredatabricks.net). - Workspace ID (ZeroBus) -- The numeric ID in the workspace URL (e.g.
https://adb-<workspace_id>.<n>.azuredatabricks.net) or under Workspace settings. - Region (ZeroBus) -- The AWS region of the workspace (e.g.
us-west-2).
Troubleshooting
Connection Issues
- Server hostname: Ensure the hostname is correct and reachable (e.g.
adb-1234567890.azuredatabricks.net) - ZeroBus endpoint: Verify the Workspace ID and Region -- a mismatch produces DNS or 404 errors at
<workspace_id>.zerobus.<region>.cloud.databricks.com
Authentication Errors
- 401 Unauthorized: Check that your OAuth credentials are valid and not expired
- Token request rejected (ZeroBus): The token endpoint validates the requested
authorization_details. MissingUSE CATALOG,USE SCHEMA, orSELECT/MODIFYon the target table will cause the token request to fail - OAuth M2M: Ensure the service principal is added to the workspace
Permission Errors
- USE SCHEMA denied: Grant
USE SCHEMAon the target schema to your principal - Volume access denied (Autoloader): Grant
READ VOLUMEandWRITE VOLUMEon the volume - Table privileges denied (ZeroBus): Grant
SELECT, MODIFYon the target table
Data Loading Issues
- Autoloader not picking up files: Verify your Autoloader job reads from
/Volumes/<catalog>/<schema>/<volume>/ - ZeroBus ingest failures: The error message includes the HTTP status and Databricks response -- common causes are schema mismatches against the target Delta table or revoked table privileges
- ZeroBus error 4024: The target table is on Unity Catalog's default/managed storage, which ZeroBus rejects. Recreate the table inside a catalog or schema with an explicit managed location backed by your own S3 / ADLS / GCS storage credentials
- Large batch failures: If uploads fail with 413 errors, reduce
data_sizein batch configuration
Limitations
- Catalog and schema must exist before configuring the output
- Autoloader mode: Monad only stages files -- you are responsible for configuring the Autoloader job in Databricks
- ZeroBus mode: The target table must already exist with a schema compatible with the incoming records; Monad does not create or evolve the table
- Table names in ZeroBus mode must match
^[A-Za-z0-9_-]+$
ZeroBus service limits
ZeroBus is a Databricks-managed direct-write API with strict per-request, per-stream, and per-workspace quotas. Monad's batch caps (5,000-50,000 records, 5-10 MB per batch) are sized to stay within these limits, but you should still review the upstream constraints before scaling up:
- Maximum request size -- the data plane rejects oversized payloads with a 413 / size-limit error
- Per-stream and per-workspace throughput -- ZeroBus throttles sustained writes beyond the documented QPS / MB-per-second limits
- Schema compatibility -- the target Delta table schema must already match the incoming records; ZeroBus does not auto-evolve schemas
- Storage backing -- target tables must live under a catalog or schema with an explicit managed location, not the metastore's default storage (see ZeroBus table requirements)
For the authoritative list of quotas, supported regions, and table requirements, see the Databricks docs: ZeroBus limits.
Best Practices
- Use default batch settings -- they are optimized for bulk loading throughput
- Pre-create catalog, schema, and (ZeroBus) target table -- Monad expects these to exist
- Use dedicated service principals with only the required permissions
- Pick Autoloader when you want Databricks to own ingestion scheduling and schema evolution
- Pick ZeroBus when you want lower end-to-end latency and direct writes without staging files
Related Articles
Monad
- Databricks output -- the standard Databricks output (Copy Into / Autoloader against a SQL warehouse)
Databricks -- ZeroBus
- ZeroBus ingest overview
- Get your workspace URL and ZeroBus ingest endpoint
- ZeroBus limits and quotas