Skip to main content

Azure Blob Storage

Writes data to Azure Blob Storage containers in various file formats. Supports configurable partitioning, compression, and batching options for optimal data organization and performance.

Requirements

To configure Azure as an output destination for Monad, complete the following steps:

Step 1: Create Microsoft Entra Application

You can create one by going to your Azure portal, and clicking on/searching for App Registrations.

From the above step, you would receive a valid Application (Client) ID, Directory (Tenant ID), and a Client Secret Value, which would be used as a part of setting up the output connector.

Step 2: Create/Configure a Storage Account

You can create one by going to your Azure portal, and clicking on/searching for Storage Accounts.

After doing so, create a container by going to the storage browser on the left sidebar, and choosing the Blob Containers widget. The name of your storage account in the format https://account_name.blob.core.windows.net typically would be your Account URL.

Step 3: Setup Appropriate permissions

Monad requires specific permissions setup to allow access to ingest input data into Azure. Follow the below steps to do so.

Retrieve the exact name from the App created/being used from under App Registrations like we did before.

After doing so, go to your storage account, and under IAM, click on Add to add a new Role Assignment.

Make sure to choose the Storage Blob Data Contributor role and click next.

Assign Access to the Application like done in the Image below, and go ahead and add the role assignment.

Details

When the input is run for the first time, it performs a full sync of all files in the specified container-prefix. After each successful page of objects is processed, the processor checkpoints its state by saving:

  • The highest LastModified timestamp encountered
  • The lexicographically greatest Blob key at that timestamp

On subsequent runs, the processor performs an incremental sync starting from last checkpoint timestamp. On restart in case of any form of failure, we resume from the day prefix of the last checkpointed timestamp. A checkpoint occurs at every page within a prefix. So while processing a prefix, if a failure occurs, the processor will restart from the last completed page's checkpoint. This means that while you may not lose out on records, you may re-process some records within the last page in case of any catastrophic failures.

  • To avoid this, we recommend publishing Blob data to a queue that can be consumed from to avoid such failures.

  • Please also note we rescan and drop all data based on our deduplication logic on every single sync which occurs in a day prefix. This means that for larger containers, this may lead to hitting rate limits since we will be scanning the same data a large number of times in a day. To avoid this, we recommend publishing Blob data to a queue that can be consumed from to avoid such scenarios.

  • Prefixes must be hive compliant/simple date always. Anything other than this can cause unexpected behavior in the input.

  • Each log's last updated time should be on the same date as the logical prefix itself. so any object that lands in the 2025/08/10 prefix should have a last updated time of 2025/08/10 (in its ISO8601 format). Not doing so can cause unexpected behavior in the input.

  • To avoid such tight boundaries, we recommend publishing Blob data to a queue that can be consumed from to avoid such failures.

Configuration

Settings

SettingTypeRequiredDefaultDescription
ContainerstringYes-A container organizes a set of blobs, similar to a directory in a file system
Account URLstringYes-Represents your storage account in Azure. Typically of the format https://account_name.blob.core.windows.net.
PrefixstringNo-An optional prefix for Azure object keys to organize data within the container
Format ConfigurationobjectYes-The format configuration for output data - see Format Options below
Compression MethodstringYes-The compression method to be applied to the data before storing (e.g., gzip, snappy, none)
Partition FormatstringYessimple_dateThe format for organizing data into partitions within the Azure Container
Batch ConfigurationobjectNoSee defaults given in one of the sections belowControls when batches are written to Azure
Record LocationstringNo-Location of the record in the JSON object. Applies only if the format is JSON. Leave empty if you want the entire record.
Backfill Start TimestringNo-The date to start fetching data from. If not specified, no past records will be fetched.

Format Options

The output format determines how your data is structured in the storage files. You must configure exactly one format type you can see documentation on formats here: Formats.

Partition Format Options

  1. Simple Date Format (simple_date):
  • Structure: {prefix}/{YYYY}/{MM}/{DD}/{filename}
  • Example: my-data/2024/01/15/20240115T123045Z-uuid.json.gz
  • Use case: Straightforward date-based organization
  1. Hive-Compliant Format (hive_compliant):
  • Structure: {prefix}/year={YYYY}/month={MM}/day={DD}/{filename}
  • Example: my-data/year=2024/month=01/day=15/20240115T123045Z-uuid.parquet
  • Use case: Compatible with Athena, Hive, and other query engines that expect this partitioning scheme

Both partition formats use UTC time for consistency across different time zones.

Secrets

SecretTypeRequiredDescription
Tenant IDstringYesThe Azure Active Directory tenant (directory) ID.
Client IDstringYesThe application (client) ID registered in Azure Active Directory.
Client SecretstringYesThe client secret associated with the registered application in Azure AD.

Custom Schema Handling

If the source data doesn't align with any of the OpenSecurityControlFramework (OSCF) schemas, you can create a custom transformation using our JQ transform pipeline. For example:

{
metadata: {
schema_version: "1.0.0",
custom_framework: "my_framework"
},
controls: .[]
}

For more information on JQ and how to write your own JQ transformations see the JQ docs here..

If you believe this data source should be included in the standard OSCF schema set, please reach out to our team at support@monad.com. We're always looking to expand our coverage of security control frameworks based on community needs.