Object Storage

Enables real-time ingestion of objects from any S3-compatible object storage service for continuous data processing.

Sync Type: Incremental

Overview

The Object Storage input connector allows you to stream data from any S3-compatible storage service into Monad. This includes services like:

MinIO
Wasabi
DigitalOcean Spaces
Backblaze B2
Google Cloud Storage (S3-compatible mode)
Any other S3-compatible storage service

Requirements

Access credentials (Access Key and Secret Key) for your object storage service
Read permissions on the bucket and objects you want to ingest
Objects should be organized using date-based partitioning for incremental sync functionality

Details

When the input is run for the first time, it performs a full sync of all files in the specified bucket-prefix. After each successful page of objects is processed, the processor checkpoints its state by saving:

The highest LastModified timestamp encountered
The lexicographically greatest Blob key at that timestamp

On subsequent runs, the processor performs an incremental sync starting from last checkpoint timestamp. On restart in case of any form of failure, we resume from the day prefix of the last checkpointed timestamp. A checkpoint occurs at every page within a prefix. So while processing a prefix, if a failure occurs, the processor will restart from the last completed page's checkpoint. This means that while you may not lose out on records, you may re-process some records within the last page in case of any catastrophic failures.

To avoid this, we recommend publishing Blob data to a queue that can be consumed from to avoid such failures.
Please also note we rescan and drop all data based on our deduplication logic on every single sync which occurs in a day prefix. This means that for larger containers, this may lead to hitting rate limits since we will be scanning the same data a large number of times in a day. To avoid this, we recommend publishing Blob data to a queue that can be consumed from to avoid such scenarios.
Prefixes must be hive compliant/simple date always. Anything other than this can cause unexpected behavior in the input.
Each log's last updated time should be on the same date as the logical prefix itself. so any object that lands in the 2025/08/10 prefix should have a last updated time of 2025/08/10 (in its ISO8601 format). Not doing so can cause unexpected behavior in the input.
To avoid such tight boundaries, we recommend publishing Blob data to a queue that can be consumed from to avoid such failures.

Configuration

The following configuration defines the input parameters. Each field's specifications, such as type, requirements, and descriptions, are detailed below.

Settings

Setting	Type	Required	Default	Description
Endpoint	string	Yes	-	Endpoint URL for the object storage service (e.g., `https://minio.example.com`, `https://s3.amazonaws.com`)
Skip SSL Verification	boolean	No	false	Skip SSL verification for self-signed certificates. Only use this for development/testing environments.
Use Path Style	boolean	No	true	Whether to use path-style URLs (`endpoint.com/bucket/object`) vs virtual-hosted-style (`bucket.endpoint.com/object`). Most S3-compatible services require this to be true.
Bucket	string	Yes	-	Name of the storage bucket
Prefix	string	No	-	Prefix that leads to the start of the expected partition. For example, if your objects are at `/logs/year=2024/month=01/day=01/`, the prefix would be `logs`.
Region	string	No	us-east-1	Optional region for the object storage service. This is often required for services like AWS S3.
Compression	string	Yes	-	Compression format of the objects. Options include: `none`, `gzip`, `zstd`, `snappy`, `lz4`
Format	string	Yes	json	File format of the objects. Options include: `json`, `csv`, `parquet`, `avro`
Partition Format	string	Yes	Simple Date	Specifies the partition format of your bucket. See Partition Format section below.
Record Location	string	No	-	Location of the record in the JSON object. See Record Location for syntax and examples.

Secrets

Secret	Type	Required	Description
Access Key	string	Yes	Access key for object storage authentication
Secret Key	string	Yes	Secret key for object storage authentication

Partition Format

The Partition Format setting specifies the existing organization of data within your object storage bucket. This is crucial for the system to correctly navigate and read your data. Select the option that matches your current bucket structure:

Simple Date Format (simple date):

Structure: YYYY/MM/DD
Example: 2024/01/01
Use case: For buckets using basic chronological organization of data

Hive-compatible Format (hive compliant):

Structure: year=YYYY/month=MM/day=DD
Example: year=2024/month=01/day=01
Use case: For buckets set up in a Hive-compatible format, common in data lake configurations

Selecting the correct Partition Format ensures that the system can efficiently locate and process your existing data by matching your bucket's current structure. This setting does not change your bucket's organization; it tells the system how to navigate it. NOTE: Your data MUST be partitioned in one of the above formats or subsequent syncs (after the initial sync) will not be able to find your data.

Configuration Examples

MinIO (Self-hosted)

{
  "settings": {
    "endpoint": "https://minio.company.internal:9000",
    "skip_ssl_verification": true,
    "use_path_style": true,
    "bucket": "security-logs",
    "prefix": "cloudtrail",
    "compression": "none",
    "format": "json",
    "partition_format": "hive compliant",
    "record_location": "$.Records"
  },
  "secrets": {
    "access_key": "minioadmin",
    "secret_key": "minioadmin123"
  }
}

Wasabi

{
  "settings": {
    "endpoint": "https://s3.wasabisys.com",
    "skip_ssl_verification": false,
    "use_path_style": true,
    "bucket": "backup-logs",
    "prefix": "security/events",
    "region": "us-east-1",
    "compression": "zstd",
    "format": "parquet",
    "partition_format": "simple date"
  },
  "secrets": {
    "access_key": "WASABI_ACCESS_KEY",
    "secret_key": "WASABI_SECRET_KEY"
  }
}

Supported Formats and Compression

File Formats

JSON: Supports nested JSON with configurable record location
CSV: Comma-separated values
Parquet: Columnar storage format
Avro: Row-based storage format with schema

Compression Types

none: Uncompressed files
gzip: GNU zip compression

Troubleshooting

Common Issues

Connection Errors

Verify the endpoint URL is correct and includes the protocol (https:// or http://)
Check if SSL verification needs to be disabled for self-signed certificates
Ensure the service is accessible from Monad's infrastructure

Authentication Failures

Verify access key and secret key are correct
Check if the credentials have the necessary permissions

Path Style Issues

If getting bucket not found errors, try toggling the "Use Path Style" setting

Missing Data

Verify the partition format matches your bucket structure
Your data MUST be partitioned properly by date. see Partition format above for references.
Check if the prefix is correctly specified. Leading and trailing slashes are stripped automatically.

Overview​

Requirements​

Details​

Configuration​

Settings​

Secrets​

Partition Format​

Configuration Examples​

MinIO (Self-hosted)​

Wasabi​

Supported Formats and Compression​

File Formats​

Compression Types​

Troubleshooting​

Common Issues​