Snowflake
Stream data from your Monad pipeline directly to Snowflake tables with automatic schema inference, secure authentication, and cost-optimized data loading.
Overview
The Snowflake output enables you to:
- Automatically create tables with inferred schemas from your data
- Use cost-optimized data loading that automatically switches between methods
- Batch process data efficiently with configurable batch sizes
- Use secure key pair authentication (recommended) or password authentication
- Handle schema changes gracefully with case-insensitive column matching
- Minimize Snowflake compute costs through intelligent INSERT operations
Cost Optimization
The Snowflake output uses an intelligent dual-method approach to minimize costs:
- Initial Table Creation: When a table doesn't exist, the output uses
COPY INTOwith schema inference to automatically create the table structure - Ongoing Operations: Once the table exists, the output switches to direct
INSERTstatements, which are more cost-effective for smaller batch sizes
This approach provides the convenience of automatic table creation while optimizing costs for continuous data streaming.
Requirements
-
Snowflake Account: You need access to create service accounts, roles, and grant permissions
-
Authentication Setup: Monad supports two authentication methods:
- Key Pair Authentication (Recommended): Follow Snowflake's guide to generate RSA keys
- Password Authentication: Use standard username/password (less secure)
- Create Service Account (for key pair auth):
CREATE USER monad_service_account
COMMENT = 'Service account for monad snowflake output'
type = 'service'
RSA_PUBLIC_KEY='Your RSA Public Key';
- Create Role and Grant Permissions:
-- Create a custom role for the service account
CREATE ROLE monad_service_account_role;
-- Grant the role to the service account
GRANT ROLE monad_service_account_role TO USER monad_service_account;
-- Grant necessary privileges
GRANT USAGE ON WAREHOUSE {your_warehouse} TO ROLE monad_service_account_role;
GRANT USAGE ON DATABASE {your_database} TO ROLE monad_service_account_role;
GRANT USAGE ON SCHEMA {your_database}.{your_schema} TO ROLE monad_service_account_role;
GRANT CREATE TABLE ON SCHEMA {your_database}.{your_schema} TO ROLE monad_service_account_role;
GRANT CREATE STAGE ON SCHEMA {your_database}.{your_schema} TO ROLE monad_service_account_role;
GRANT CREATE FILE FORMAT ON SCHEMA {your_database}.{your_schema} TO ROLE monad_service_account_role;
GRANT MONITOR ON DATABASE {your_database} TO ROLE monad_service_account_role;
GRANT USAGE ON FUTURE SCHEMAS IN DATABASE {your_database} TO ROLE monad_service_account_role;
GRANT MONITOR ON WAREHOUSE {your_warehouse} TO ROLE monad_service_account_role;
How It Works
The Snowflake output intelligently manages data loading based on table existence:
When Table Doesn't Exist (First Run)
- Batching: Messages are collected based on your batch configuration
- File Creation: Batched data is written to a temporary JSONL file
- Stage Management: A Snowflake stage is automatically created or replaced in your schema
- Upload: The JSONL file is uploaded to the stage using Snowflake's
PUTcommand - Table Creation: Monad automatically:
- Creates a JSON file format for parsing
- Infers the schema from your data (respecting the
case_insensitivitysetting) - Creates the table with the inferred schema
- Data Loading: Data is copied from the stage using
COPY INTOwith case-insensitive matching - Cleanup: Temporary files are deleted after successful processing
When Table Exists (Subsequent Runs)
- Batching: Messages are collected based on your batch configuration
- Direct Insert: Data is loaded using cost-efficient
INSERTstatements with:
- Automatic JSON parsing for complex data types
- Case-insensitive column matching
- Efficient batch processing
- No Stage Required: Direct inserts bypass the staging process entirely
This dual approach ensures automatic table setup while minimizing ongoing operational costs of small batch data loads.
Configuration
Settings
| Setting | Type | Required | Default | Description |
|---|---|---|---|---|
account | string | Yes | - | Your Snowflake account identifier (e.g., 'account_name.cloud_provider') |
user | string | Yes | - | Username for authentication |
database | string | Yes | - | Target database name |
warehouse | string | Yes | - | Virtual warehouse for query execution |
schema | string | Yes | - | Schema within the database |
table | string | Yes | - | Target table name (created automatically if needed) |
case_insensitivity | boolean | Yes | false | Controls table creation behavior: true Column names converted to uppercase during schema inference: false Column names preserve original casing. Note: Data loading always uses case-insensitive matching. |
stage | string | Yes | - | Stage name for data uploads (only used for initial table creation) |
role | string | Yes | - | Role name with required permissions |
auth_type | string | Yes | private key | Authentication method: private key or password |
batch_config | object | No | See below | Batching configuration |
Batch Configuration
The batch configuration has been optimized for cost-efficient operations with smaller, more frequent batches so you get your data faster:
| Setting | Default | Min | Max | Description |
|---|---|---|---|---|
record_count | 1,000 | 25 | 1,000 | Maximum records per batch |
data_size | 1MB | 0.5MB | 1MB | Maximum batch size |
publish_rate | 15s | 1s | 30s | Maximum time before sending batch |
Secrets
| Setting | Type | Required | Description |
|---|---|---|---|
privateKey | string | Yes* | RSA private key for authentication (required if using key pair auth) |
password | string | Yes* | Password for authentication (required if using password auth) |
*One authentication method must be configured
Authentication Methods
Key Pair Authentication (Recommended)
Provide your RSA private key in one of these formats:
- Raw PEM format
- Base64 encoded
Password Authentication
Standard username/password authentication (less secure than key pair)
Performance and Concurrency
The Snowflake output supports up to 5 concurrent batch operations, allowing for:
- Higher throughput during peak loads
- Better utilization of Snowflake warehouse resources
- Reduced latency for data availability
Troubleshooting
Connection Issues
- Account identifier: If unsure about your account format, check Snowflake's guide
- Network access: Ensure your Snowflake instance is accessible from Monad's infrastructure
- Role permissions: Verify the role has all required grants listed in the requirements
- Timeout errors: Connection timeouts are set to 30 seconds; ensure your warehouse starts quickly
Authentication Errors
- Key pair: Ensure your private key matches the public key configured in Snowflake
- Private key format: Try both raw PEM and base64 encoded formats
- Password: Verify credentials and that password authentication is enabled
Data Loading Issues
- Missing data: Check that database, schema, table, and stage names are correct
- Schema mismatch: The
case_insensitivitysetting controls column matching: - When
true: Column names are converted to uppercase (Snowflake's default) - When
false: Column names preserve original casing - Stage errors: The stage is only created for initial table setup - check permissions if table creation fails
- Insert failures: Ensure your data matches the existing table schema
Performance Optimization
- Cost optimization: The default batch configuration is optimized for cost efficiency
- High volume scenarios: If you have very high data volumes, consider:
- Increasing batch sizes slightly (but staying within limits)
- Monitoring warehouse utilization
- Using a larger warehouse for better performance
- Low latency requirements: Decrease
publish_ratefor faster data availability
Limitations
- Only JSON/JSONL data format is supported
- Maximum batch: 1,000 records or 1MB (optimized for cost efficiency)
- Tables are created with inferred schemas - explicit schema definition not supported - however you can create your table beforehand with your defined schema and use that.
- Stages are managed automatically and only used for initial table creation
- Direct INSERT method requires existing table schema to match incoming data. If your data is inconsistent this will result in INSERT failures.
Best Practices
- Use key pair authentication for production environments
- Monitor costs: The dual-method approach minimizes costs, but monitor your Snowflake usage
- Let the output create tables: Allow the initial COPY INTO to create tables with proper schema
- Use default batch settings: The defaults are optimized for cost-efficient operations
- Set appropriate warehouse size based on your concurrency needs (5 concurrent operations)
- Use dedicated service accounts with minimal required permissions
- Regularly rotate keys for security
- Monitor table schema: Ensure incoming data matches existing table structure to avoid INSERT failures