Skip to main content

Snowflake

Stream data from your Monad pipeline directly to Snowflake tables with automatic schema inference, secure authentication, and cost-optimized data loading.

Overview

The Snowflake output enables you to:

  • Automatically create tables with inferred schemas from your data
  • Use cost-optimized data loading that automatically switches between methods
  • Batch process data efficiently with configurable batch sizes
  • Use secure key pair authentication (recommended) or password authentication
  • Handle schema changes gracefully with case-insensitive column matching
  • Minimize Snowflake compute costs through intelligent INSERT operations

Cost Optimization

The Snowflake output uses an intelligent dual-method approach to minimize costs:

  1. Initial Table Creation: When a table doesn't exist, the output uses COPY INTO with schema inference to automatically create the table structure
  2. Ongoing Operations: Once the table exists, the output switches to direct INSERT statements, which are more cost-effective for smaller batch sizes

This approach provides the convenience of automatic table creation while optimizing costs for continuous data streaming.

Requirements

  1. Snowflake Account: You need access to create service accounts, roles, and grant permissions

  2. Authentication Setup: Monad supports two authentication methods:

  • Key Pair Authentication (Recommended): Follow Snowflake's guide to generate RSA keys
  • Password Authentication: Use standard username/password (less secure)
  1. Create Service Account (for key pair auth):
CREATE USER monad_service_account
COMMENT = 'Service account for monad snowflake output'
type = 'service'
RSA_PUBLIC_KEY='Your RSA Public Key';
  1. Create Role and Grant Permissions:
-- Create a custom role for the service account
CREATE ROLE monad_service_account_role;

-- Grant the role to the service account
GRANT ROLE monad_service_account_role TO USER monad_service_account;

-- Grant necessary privileges
GRANT USAGE ON WAREHOUSE {your_warehouse} TO ROLE monad_service_account_role;
GRANT USAGE ON DATABASE {your_database} TO ROLE monad_service_account_role;
GRANT USAGE ON SCHEMA {your_database}.{your_schema} TO ROLE monad_service_account_role;
GRANT CREATE TABLE ON SCHEMA {your_database}.{your_schema} TO ROLE monad_service_account_role;
GRANT CREATE STAGE ON SCHEMA {your_database}.{your_schema} TO ROLE monad_service_account_role;
GRANT CREATE FILE FORMAT ON SCHEMA {your_database}.{your_schema} TO ROLE monad_service_account_role;
GRANT MONITOR ON DATABASE {your_database} TO ROLE monad_service_account_role;
GRANT USAGE ON FUTURE SCHEMAS IN DATABASE {your_database} TO ROLE monad_service_account_role;
GRANT MONITOR ON WAREHOUSE {your_warehouse} TO ROLE monad_service_account_role;

How It Works

The Snowflake output intelligently manages data loading based on table existence:

When Table Doesn't Exist (First Run)

  1. Batching: Messages are collected based on your batch configuration
  2. File Creation: Batched data is written to a temporary JSONL file
  3. Stage Management: A Snowflake stage is automatically created or replaced in your schema
  4. Upload: The JSONL file is uploaded to the stage using Snowflake's PUT command
  5. Table Creation: Monad automatically:
  • Creates a JSON file format for parsing
  • Infers the schema from your data (respecting the case_insensitivity setting)
  • Creates the table with the inferred schema
  1. Data Loading: Data is copied from the stage using COPY INTO with case-insensitive matching
  2. Cleanup: Temporary files are deleted after successful processing

When Table Exists (Subsequent Runs)

  1. Batching: Messages are collected based on your batch configuration
  2. Direct Insert: Data is loaded using cost-efficient INSERT statements with:
  • Automatic JSON parsing for complex data types
  • Case-insensitive column matching
  • Efficient batch processing
  1. No Stage Required: Direct inserts bypass the staging process entirely

This dual approach ensures automatic table setup while minimizing ongoing operational costs of small batch data loads.

Configuration

Settings

SettingTypeRequiredDefaultDescription
accountstringYes-Your Snowflake account identifier (e.g., 'account_name.cloud_provider')
userstringYes-Username for authentication
databasestringYes-Target database name
warehousestringYes-Virtual warehouse for query execution
schemastringYes-Schema within the database
tablestringYes-Target table name (created automatically if needed)
case_insensitivitybooleanYesfalseControls table creation behavior: true
Column names converted to uppercase during schema inference: false
Column names preserve original casing.
Note: Data loading always uses case-insensitive matching.
stagestringYes-Stage name for data uploads (only used for initial table creation)
rolestringYes-Role name with required permissions
auth_typestringYesprivate keyAuthentication method: private key or password
batch_configobjectNoSee belowBatching configuration

Batch Configuration

The batch configuration has been optimized for cost-efficient operations with smaller, more frequent batches so you get your data faster:

SettingDefaultMinMaxDescription
record_count1,000251,000Maximum records per batch
data_size1MB0.5MB1MBMaximum batch size
publish_rate15s1s30sMaximum time before sending batch

Secrets

SettingTypeRequiredDescription
privateKeystringYes*RSA private key for authentication (required if using key pair auth)
passwordstringYes*Password for authentication (required if using password auth)

*One authentication method must be configured

Authentication Methods

Provide your RSA private key in one of these formats:

  • Raw PEM format
  • Base64 encoded

Password Authentication

Standard username/password authentication (less secure than key pair)

Performance and Concurrency

The Snowflake output supports up to 5 concurrent batch operations, allowing for:

  • Higher throughput during peak loads
  • Better utilization of Snowflake warehouse resources
  • Reduced latency for data availability

Troubleshooting

Connection Issues

  • Account identifier: If unsure about your account format, check Snowflake's guide
  • Network access: Ensure your Snowflake instance is accessible from Monad's infrastructure
  • Role permissions: Verify the role has all required grants listed in the requirements
  • Timeout errors: Connection timeouts are set to 30 seconds; ensure your warehouse starts quickly

Authentication Errors

  • Key pair: Ensure your private key matches the public key configured in Snowflake
  • Private key format: Try both raw PEM and base64 encoded formats
  • Password: Verify credentials and that password authentication is enabled

Data Loading Issues

  • Missing data: Check that database, schema, table, and stage names are correct
  • Schema mismatch: The case_insensitivity setting controls column matching:
  • When true: Column names are converted to uppercase (Snowflake's default)
  • When false: Column names preserve original casing
  • Stage errors: The stage is only created for initial table setup - check permissions if table creation fails
  • Insert failures: Ensure your data matches the existing table schema

Performance Optimization

  • Cost optimization: The default batch configuration is optimized for cost efficiency
  • High volume scenarios: If you have very high data volumes, consider:
  • Increasing batch sizes slightly (but staying within limits)
  • Monitoring warehouse utilization
  • Using a larger warehouse for better performance
  • Low latency requirements: Decrease publish_rate for faster data availability

Limitations

  • Only JSON/JSONL data format is supported
  • Maximum batch: 1,000 records or 1MB (optimized for cost efficiency)
  • Tables are created with inferred schemas - explicit schema definition not supported - however you can create your table beforehand with your defined schema and use that.
  • Stages are managed automatically and only used for initial table creation
  • Direct INSERT method requires existing table schema to match incoming data. If your data is inconsistent this will result in INSERT failures.

Best Practices

  1. Use key pair authentication for production environments
  2. Monitor costs: The dual-method approach minimizes costs, but monitor your Snowflake usage
  3. Let the output create tables: Allow the initial COPY INTO to create tables with proper schema
  4. Use default batch settings: The defaults are optimized for cost-efficient operations
  5. Set appropriate warehouse size based on your concurrency needs (5 concurrent operations)
  6. Use dedicated service accounts with minimal required permissions
  7. Regularly rotate keys for security
  8. Monitor table schema: Ensure incoming data matches existing table structure to avoid INSERT failures