Delimited Format for Output
This document explains how to configure delimited file output (such as CSV, TSV) for any Monad output component that supports delimited formats.
Overview
Monad provides a flexible delimited format configuration that allows you to convert your data into various delimiter-separated values formats. The most common is CSV (Comma-Separated Values), but you can specify any single character as a delimiter to create other formats such as TSV (Tab-Separated Values) or custom-delimited files.
Configuration Options
When configuring delimited output in Monad, you need to specify:
{
"delimiter": "[character]",
"headers": ["header1", "header2", "..."]
}
Parameters
| Parameter | Description | Required | Default |
|---|---|---|---|
delimiter | A single character to use as the field separator | Yes | - |
headers | An array of column header names in the desired order | No | All possible fields, sorted alphabetically |
Delimiter Examples
Here are some common delimiter characters you can use:
| Format | Delimiter | Description |
|---|---|---|
| CSV | , | Comma-separated values (most common) |
| TSV | \t | Tab-separated values |
| Pipe-separated | | | Pipe-separated values |
| Semicolon-separated | ; | Semicolon-separated values (common in locales where comma is used as a decimal separator) |
Header Configuration
Explicit Headers
When you explicitly define the headers array, Monad will:
- Only include the specified columns in the output
- Order the columns exactly as specified in the array
- Output empty values for any missing fields
Example Configuration:
{
"delimiter": ",",
"headers": ["name", "age", "email"]
}
This configuration will produce a file with exactly these three columns in the specified order.
Automatic Headers
If you omit the headers array, Monad will:
- Automatically detect all possible fields from your data
- Sort the headers alphabetically
- Include all fields found in any record
Example Configuration:
{
"delimiter": ","
}
This configuration will include all fields present in your data, ordered alphabetically.
Output Example
Given the following JSON data:
[
{"name": "John", "age": 30, "email": "john@example.com"},
{"name": "Jane", "age": 25, "email": "jane@example.com", "department": "Sales"}
]
With Explicit Headers
Configuration:
{
"delimiter": ",",
"headers": ["name", "email", "age"]
}
Output:
name,email,age
John,john@example.com,30
Jane,jane@example.com,25
Note that department is omitted because it wasn't in the specified headers.
With Automatic Headers
Configuration:
{
"delimiter": ","
}
Output:
age,department,email,name
30,,john@example.com,John
25,Sales,jane@example.com,Jane
All fields are included, sorted alphabetically, with empty values for missing fields.
Best Practices
-
Explicit Headers Recommended: For consistent output across multiple batches, always specify explicit headers. This ensures that your file structure remains consistent even if the available fields change between batches.
-
Handling Special Characters: When your data contains the delimiter character, Monad will automatically handle proper escaping in the output.
-
Unicode Support: Monad supports Unicode characters in both delimiters and data, making it useful for international datasets.
-
Field Order: When working with large datasets, controlling the field order with explicit headers can significantly improve readability and downstream processing.
-
Missing Values: Fields that don't exist in a particular record will be output as empty values, not as
nullor other placeholders.
Batch Considerations
When processing data in batches without explicit headers:
- Each batch may generate files with different header sets if the fields vary between batches
- The order of headers may change between batches if new fields appear
To prevent this inconsistency, it's strongly recommended to explicitly define headers when your data structure might vary between batches.
File Extension
Delimited output files use the .csv extension by default, regardless of the delimiter used.
Complete Example
Here's a complete configuration example for a Monad output component using CSV format:
{
"component": "file_output",
"config": {
"path": "/data/exports/",
"format": "delimited",
"delimited_format": {
"delimiter": ",",
"headers": ["id", "name", "email", "created_at"]
}
}
}
This configuration will output your data as a CSV file with the specified column order, including only the fields listed in the headers array.