Object Storage
Writes data to S3-compatible object storage services in various file formats. Supports multiple providers including AWS S3, MinIO, Tigris, and other S3-compatible storage systems with configurable partitioning and batching options.
Requirements
To configure Object Storage as an output destination for Monad, you'll need:
- Access Credentials:
- Access Key ID
- Secret Access Key
- These credentials must have permissions to read, write, and list objects in your target bucket
- Storage Service Details:
- Endpoint URL of your S3-compatible storage service
- Bucket name where data will be stored
- Region (optional for some providers)
- Bucket Permissions: The credentials must have permissions to perform these S3 actions:
s3:PutObject- Write objects to the buckets3:GetObject- Read objects from the buckets3:ListBucket- List objects in the bucket
Supported Storage Services
This output supports any S3-compatible storage service, including:
- Amazon S3
- MinIO
- Tigris
- DigitalOcean Spaces
- Wasabi
- Backblaze B2
- And other S3-compatible services
Functionality
The output continuously sends data to your specified storage path, formatted as prefix/partition/filename.format.compression, where:
- The partition structure depends on your chosen partition format (simple date or Hive-compliant)
- Files are created based on batching configuration (record count, data size, or time elapsed)
- Data is compressed using your selected compression method before storage
Batching Behavior
Monad batches records before sending to storage based on three configurable limits:
- Record Count: Maximum number of records per file (default: 100,000)
- Data Size: Maximum uncompressed size per file (default: 10 MB)
- Time Interval: Maximum time before flushing a batch (default: 45 seconds)
Whichever limit is reached first triggers the batch to be written to storage. This ensures timely delivery while optimizing file sizes for downstream processing.
Output Formats
The output format depends on your configuration:
- JSON Array Format: Records are stored as a standard JSON array
- JSON Nested Format: Records are wrapped under your specified key (e.g.,
{"records": [...]}) - JSON Line Format: Each record is on its own line (JSONL format)
- Delimited Format: Records in CSV or other delimited formats
- Parquet Format: Columnar storage format for efficient analytics
Configuration
Settings
| Setting | Type | Required | Default | Description |
|---|---|---|---|---|
| Bucket Name | string | Yes | - | The name of the object storage bucket where data will be stored |
| Endpoint URL | string | Yes | - | The endpoint URL for the object storage service (e.g., https://fly.storage.tigris.dev, https://s3.amazonaws.com) |
| Region | string | No | us-east-1 | The region for the object storage service (optional for some providers) |
| Object Prefix | string | No | - | An optional prefix for object keys to organize data within the bucket |
| Use Path Style URLs | boolean | No | true | Whether to use path-style URLs (endpoint.com/bucket/object) vs virtual-hosted-style (bucket.endpoint.com/object). Most S3-compatible services require this to be true |
| Skip SSL Verification | boolean | No | false | Whether to skip SSL certificate verification (useful for self-signed certificates or development environments) |
| Output Format | object | Yes | - | The format configuration for output data - see Format Options below |
| Compression Method | string | Yes | - | The compression method to be applied to the data before storing (e.g., gzip, snappy, none) |
| Partition Format | string | Yes | simple date | Specifies the format for organizing data into partitions within your bucket |
| Batch Configuration | object | No | See defaults | Controls when batches are written to storage |
Format Options
The output format determines how your data is structured in the storage files. You must configure exactly one format type you can see documentation on formats here: Formats.
Partition Format Options
- Simple Date Format (
simple_date):
- Structure:
{prefix}/{YYYY}/{MM}/{DD}/{filename} - Example:
my-data/2024/01/15/20240115T123045Z-uuid.json.gz - Use case: Straightforward date-based organization
- Hive-Compliant Format (
hive_compliant):
- Structure:
{prefix}/year={YYYY}/month={MM}/day={DD}/{filename} - Example:
my-data/year=2024/month=01/day=15/20240115T123045Z-uuid.parquet - Use case: Compatible with Hive, Athena, and other query engines that expect this partitioning scheme
Batch Configuration
| Setting | Type | Default | Min | Max | Description |
|---|---|---|---|---|---|
| Record Count | integer | 100,000 | 500 | 1,000,000 | Maximum number of records per file |
| Data Size | integer | 10 MB | 1 MB | 25 MB | Maximum uncompressed data size per file |
| Publish Rate | integer | 45 seconds | 1 second | 60 seconds | Maximum time before flushing a batch |
Secrets
| Secret | Type | Required | Description |
|---|---|---|---|
| Access Key ID | string | Yes | The access key ID for object storage authentication |
| Secret Access Key | string | Yes | The secret access key for object storage authentication |
Examples
AWS S3 Configuration
{
"settings": {
"bucket": "my-data-bucket",
"endpoint": "https://s3.amazonaws.com",
"region": "us-west-2",
"prefix": "monad-data",
"use_path_style": false,
"compression": "gzip",
"partition_format": "hive_compliant",
"format_config": {
"format": "parquet",
"parquet": {
"schema": "{\"Tag\": \"name=events\", \"Fields\": [{\"Tag\": \"name=timestamp, type=INT64, repetitiontype=REQUIRED\"}, {\"Tag\": \"name=event_type, type=BYTE_ARRAY, convertedtype=UTF8, repetitiontype=REQUIRED\"}, {\"Tag\": \"name=user_id, type=INT64, repetitiontype=REQUIRED\"}, {\"Tag\": \"name=properties, type=BYTE_ARRAY, convertedtype=UTF8, repetitiontype=OPTIONAL\"}]}"
}
}
},
"secrets": {
"access_key": "AKIAIOSFODNN7EXAMPLE",
"secret_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
}
}
MinIO Configuration
{
"settings": {
"bucket": "data-lake",
"endpoint": "https://minio.example.com",
"region": "us-east-1",
"prefix": "raw-events",
"use_path_style": true,
"skip_ssl_verification": false,
"compression": "snappy",
"partition_format": "simple_date",
"format_config": {
"format": "json",
"json": {
"type": "line"
}
},
"batch_config": {
"record_count": 50000,
"data_size": 5242880,
"publish_rate": 30
}
},
"secrets": {
"access_key": "minioadmin",
"secret_key": "minioadmin123"
}
}
Tigris Configuration
{
"settings": {
"bucket": "analytics-data",
"endpoint": "https://fly.storage.tigris.dev",
"region": "auto",
"prefix": "events",
"use_path_style": true,
"compression": "none",
"partition_format": "hive_compliant",
"format_config": {
"format": "delimited",
"delimited": {
"delimiter": ",",
"headers": ["timestamp", "user_id", "event_type", "value"]
}
}
},
"secrets": {
"access_key": "tid_EXAMPLE",
"secret_key": "tsec_EXAMPLE"
}
}