S3

Enables seamless streaming of data posted to an S3 bucket into the Monad solution.

Sync Type: Incremental

Requirements

IAM Role Assumption / Static Credentials
Example permission to attach to the role/user:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "S3BucketLevelListPermissions",
      "Effect": "Allow",
      "Action": "s3:ListBucket",
      "Resource": "arn:aws:s3:::{bucket_name}"
    },
    {
      "Sid": "S3ObjectLevelPermissions",
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:GetObject"],
      "Resource": "arn:aws:s3:::{bucket_name}/*"
    }
  ]
}

Details

When the input is run for the first time, it performs a full sync of all files in the specified bucket-prefix.

On subsequent runs, the processor performs an incremental sync starting from last checkpoint timestamp it found a blob at. On restart in case of any form of failure, we resume from the day prefix of the last checkpointed timestamp. A checkpoint occurs at every page within a prefix. So while processing a prefix, if a failure occurs, the processor will restart from the last completed page's checkpoint. You will not lose out on any records, however you may re-process some data in the S3 objects on the page where the failure occured in case of any catastrophic failures.

To avoid this, we recommend publishing S3 data to an SQS queue to avoid such failures.
Please also note we rescan and drop all data based on our deduplication logic on every single sync which occurs in a day prefix. This means that for larger buckets, this may lead to hitting rate limits since we will be scanning the same data a large number of times in a day. To avoid this, we recommend publishing S3 data to an SQS queue to avoid such scenarios.
Prefixes must be hive compliant/simple date always. Anything other than this can cause unexpected behavior in the input.
Each log's last updated time should be on the same date as the logical prefix itself. so any object that lands in the 2025/08/10 prefix should have a last updated time of 2025/08/10 (in its ISO8601 format). Not doing so can cause unexpected behavior in the input.
To avoid such tight boundaries, we recommend publishing S3 data to an SQS queue to avoid such failures.

The processor checks for new data continuously every 10 seconds, processing objects as they appear in the bucket.

Configuration

The following configuration defines the input parameters. Each field's specifications, such as type, requirements, and descriptions, are detailed below.

Settings

Setting	Type	Required	Description
Region	string	No	The region of the S3 bucket. If left blank, the region will be auto-detected.
Bucket	string	Yes	The name of the S3 bucket.
Prefix	string	No	Prefix of the S3 object keys to read.
Compression	string	Yes	Compression format of the S3 objects.
Format	string	Yes	File format of the S3 objects.
Partition Format	string	Yes	The existing partition format used in your S3 bucket.
RoleARN	string	Yes	Role ARN to assume when reading from S3.
Record Location	string	No	Location of the record in the JSON object. See Record Location for syntax and examples.
Backfill Start Time	string	No	The date to start fetching data from. If not specified, no past records will be fetched.

Partition Format

The Partition Format setting specifies the existing organization of data within your S3 bucket. This is crucial for the system to correctly navigate and read your data. Select the option that matches your current S3 bucket structure:

Simple Date Format ('simple date'):
- Structure: YYYY/MM/DD
- Example: 2024/01/01
- Use case: For buckets using basic chronological organization of data.
Hive-compatible Format ('hive compliant'):
- Structure: year=YYYY/month=MM/day=DD
- Example: year=2024/month=01/day=01
- Use case: For buckets set up in a Hive-compatible format, common in data lake configurations.

Selecting the correct Partition Format ensures that the system can efficiently locate and process your existing data by matching your S3 bucket's current structure. This setting does not change your bucket's organization; it tells the system how to navigate it.

Secrets (Static Credentials Only)

Setting	Type	Required	Description
Access Key	string	Conditional	AWS Access Key ID
Secret Key	string	Conditional	AWS Secret Access Key

⚠️ Authentication: Choose either Role ARN (recommended) or static credentials. See AWS Authentication Guide for setup instructions.

Custom Schema Handling

If the source data doesn't align with any of the OpenSecurityControlFramework (OSCF) schemas, you can create a custom transformation using our JQ transform pipeline. For example:

{
  metadata: {
    schema_version: "1.0.0",
    custom_framework: "my_framework"
  },
  controls: .[]
}

For more information on JQ and how to write your own JQ transformations see the JQ docs here..

If you believe this data source should be included in the standard OSCF schema set, please reach out to our team at support@monad.com. We're always looking to expand our coverage of security control frameworks based on community needs.

Requirements​

Details​

Configuration​

Settings​

Partition Format​

Secrets (Static Credentials Only)​

Custom Schema Handling​