Snowflake

Stream data from your Monad pipeline directly to Snowflake tables with automatic schema inference, secure authentication, and cost-optimized data loading.

Overview

The Snowflake output enables you to:

Automatically create tables with inferred schemas from your data
Use cost-optimized data loading that automatically switches between methods
Batch process data efficiently with configurable batch sizes
Use secure key pair authentication (recommended) or password authentication
Handle schema changes gracefully with case-insensitive column matching
Minimize Snowflake compute costs through intelligent INSERT operations

Cost Optimization

The Snowflake output uses an intelligent dual-method approach to minimize costs:

Initial Table Creation: When a table doesn't exist, the output uses COPY INTO with schema inference to automatically create the table structure
Ongoing Operations: Once the table exists, the output switches to direct INSERT statements, which are more cost-effective for smaller batch sizes

This approach provides the convenience of automatic table creation while optimizing costs for continuous data streaming.

Requirements

Snowflake Account: You need access to create service accounts, roles, and grant permissions
Authentication Setup: Monad supports two authentication methods:

Key Pair Authentication (Recommended): Follow Snowflake's guide to generate RSA keys
Password Authentication: Use standard username/password (less secure)

Create Service Account (for key pair auth):

CREATE USER monad_service_account
    COMMENT = 'Service account for monad snowflake output'
    type = 'service'
    RSA_PUBLIC_KEY='Your RSA Public Key';

Create Role and Grant Permissions:

-- Create a custom role for the service account
CREATE ROLE monad_service_account_role;

-- Grant the role to the service account
GRANT ROLE monad_service_account_role TO USER monad_service_account;

-- Grant necessary privileges
GRANT USAGE ON WAREHOUSE {your_warehouse} TO ROLE monad_service_account_role;
GRANT USAGE ON DATABASE {your_database} TO ROLE monad_service_account_role;
GRANT USAGE ON SCHEMA {your_database}.{your_schema} TO ROLE monad_service_account_role;
GRANT CREATE TABLE ON SCHEMA {your_database}.{your_schema} TO ROLE monad_service_account_role;
GRANT CREATE STAGE ON SCHEMA {your_database}.{your_schema} TO ROLE monad_service_account_role;
GRANT CREATE FILE FORMAT ON SCHEMA {your_database}.{your_schema} TO ROLE monad_service_account_role;
GRANT MONITOR ON DATABASE {your_database} TO ROLE monad_service_account_role;
GRANT USAGE ON FUTURE SCHEMAS IN DATABASE {your_database} TO ROLE monad_service_account_role;
GRANT MONITOR ON WAREHOUSE {your_warehouse} TO ROLE monad_service_account_role;

How It Works

The Snowflake output intelligently manages data loading based on table existence:

When Table Doesn't Exist (First Run)

Batching: Messages are collected based on your batch configuration
File Creation: Batched data is written to a temporary JSONL file
Stage Management: A Snowflake stage is automatically created or replaced in your schema
Upload: The JSONL file is uploaded to the stage using Snowflake's PUT command
Table Creation: Monad automatically:

Creates a JSON file format for parsing
Infers the schema from your data (respecting the case_insensitivity setting)
Creates the table with the inferred schema

Data Loading: Data is copied from the stage using COPY INTO with case-insensitive matching
Cleanup: Temporary files are deleted after successful processing

When Table Exists (Subsequent Runs)

Batching: Messages are collected based on your batch configuration
Direct Insert: Data is loaded using cost-efficient INSERT statements with:

Automatic JSON parsing for complex data types
Case-insensitive column matching
Efficient batch processing

No Stage Required: Direct inserts bypass the staging process entirely

This dual approach ensures automatic table setup while minimizing ongoing operational costs of small batch data loads.

Configuration

Settings

Setting	Type	Required	Default	Description
`account`	string	Yes	-	Your Snowflake account identifier (e.g., 'account_name.cloud_provider')
`user`	string	Yes	-	Username for authentication
`database`	string	Yes	-	Target database name
`warehouse`	string	Yes	-	Virtual warehouse for query execution
`schema`	string	Yes	-	Schema within the database
`table`	string	Yes	-	Target table name (created automatically if needed)
`case_insensitivity`	boolean	Yes	`false`	Controls table creation behavior: `true` Column names converted to uppercase during schema inference: `false` Column names preserve original casing. Note: Data loading always uses case-insensitive matching.
`stage`	string	Yes	-	Stage name for data uploads (only used for initial table creation)
`role`	string	Yes	-	Role name with required permissions
`auth_type`	string	Yes	`private key`	Authentication method: `private key` or `password`
`batch_config`	object	No	See below	Batching configuration

Batch Configuration

The batch configuration has been optimized for cost-efficient operations with smaller, more frequent batches so you get your data faster:

Setting	Default	Min	Max	Description
`record_count`	1,000	25	1,000	Maximum records per batch
`data_size`	1MB	0.5MB	1MB	Maximum batch size
`publish_rate`	15s	1s	30s	Maximum time before sending batch

Secrets

Setting	Type	Required	Description
`privateKey`	string	Yes*	RSA private key for authentication (required if using key pair auth)
`password`	string	Yes*	Password for authentication (required if using password auth)

*One authentication method must be configured

Authentication Methods

Key Pair Authentication (Recommended)

Provide your RSA private key in one of these formats:

Raw PEM format
Base64 encoded

Password Authentication

Standard username/password authentication (less secure than key pair)

Performance and Concurrency

The Snowflake output supports up to 5 concurrent batch operations, allowing for:

Higher throughput during peak loads
Better utilization of Snowflake warehouse resources
Reduced latency for data availability

Troubleshooting

Connection Issues

Account identifier: If unsure about your account format, check Snowflake's guide
Network access: Ensure your Snowflake instance is accessible from Monad's infrastructure
Role permissions: Verify the role has all required grants listed in the requirements
Timeout errors: Connection timeouts are set to 30 seconds; ensure your warehouse starts quickly

Authentication Errors

Key pair: Ensure your private key matches the public key configured in Snowflake
Private key format: Try both raw PEM and base64 encoded formats
Password: Verify credentials and that password authentication is enabled

Data Loading Issues

Missing data: Check that database, schema, table, and stage names are correct
Schema mismatch: The case_insensitivity setting controls column matching:
When true: Column names are converted to uppercase (Snowflake's default)
When false: Column names preserve original casing
Stage errors: The stage is only created for initial table setup - check permissions if table creation fails
Insert failures: Ensure your data matches the existing table schema

Performance Optimization

Cost optimization: The default batch configuration is optimized for cost efficiency
High volume scenarios: If you have very high data volumes, consider:
Increasing batch sizes slightly (but staying within limits)
Monitoring warehouse utilization
Using a larger warehouse for better performance
Low latency requirements: Decrease publish_rate for faster data availability

Limitations

Only JSON/JSONL data format is supported
Maximum batch: 1,000 records or 1MB (optimized for cost efficiency)
Tables are created with inferred schemas - explicit schema definition not supported - however you can create your table beforehand with your defined schema and use that.
Stages are managed automatically and only used for initial table creation
Direct INSERT method requires existing table schema to match incoming data. If your data is inconsistent this will result in INSERT failures.

Best Practices

Use key pair authentication for production environments
Monitor costs: The dual-method approach minimizes costs, but monitor your Snowflake usage
Let the output create tables: Allow the initial COPY INTO to create tables with proper schema
Use default batch settings: The defaults are optimized for cost-efficient operations
Set appropriate warehouse size based on your concurrency needs (5 concurrent operations)
Use dedicated service accounts with minimal required permissions
Regularly rotate keys for security
Monitor table schema: Ensure incoming data matches existing table structure to avoid INSERT failures

Overview​

Cost Optimization​

Requirements​

How It Works​

When Table Doesn't Exist (First Run)​

When Table Exists (Subsequent Runs)​

Configuration​

Settings​

Batch Configuration​

Secrets​

Authentication Methods​

Key Pair Authentication (Recommended)​

Password Authentication​

Performance and Concurrency​

Troubleshooting​

Connection Issues​

Authentication Errors​

Data Loading Issues​

Performance Optimization​

Limitations​

Best Practices​