Output Formats

Monad supports multiple output formats to meet diverse data integration needs. This guide helps you choose the right format for your use case and provides links to detailed configuration documentation for each format.

Supported Formats Overview

Monad currently supports three primary output format families:

Format	Description	File Extensions	Best For
JSON	Flexible text-based format with three variants	`.json`, `.jsonl`	APIs, web services, streaming, human-readable data
Delimited	Tabular formats like CSV and TSV	`.csv`	Spreadsheets, traditional analytics tools, data exchange
Parquet	Columnar binary format optimized for analytics	`.parquet`	Data warehouses, big data analytics, long-term storage

Format Comparison

Performance Characteristics

Format	Write Speed	Read Speed	Compression	File Size
JSON Array	Medium	Medium	Good with gzip	Medium
JSON Line	Fast	Fast	Good with gzip	Medium
JSON Nested	Medium	Medium	Good with gzip	Medium
Delimited (CSV)	Fast	Fast	Excellent with gzip	Small
Parquet	Slow	Very Fast	Built-in (excellent)	Very Small

Feature Support

Feature	JSON	Delimited	Parquet
Human Readable	✅ Yes	✅ Yes	❌ No (binary)
Schema Evolution	✅ Flexible	⚠️ Limited	✅ Yes
Nested Data	✅ Full support	❌ No	✅ Full support
Data Types	⚠️ Basic	⚠️ Strings only	✅ Rich type system
Query Performance	⚠️ Scan entire file	⚠️ Scan entire file	✅ Columnar access
Streaming Support	✅ Yes (line format)	✅ Yes	❌ No

Choosing the Right Format

Use JSON When:

Integrating with REST APIs or web services
Human readability is important
Your data has nested structures or varying schemas
You need streaming capabilities (use line format)
Working with document-oriented databases

Use Delimited (CSV/TSV) When:

Importing data into spreadsheet applications
Working with legacy systems or traditional BI tools
Your data is purely tabular with consistent columns
You need the smallest file sizes when compressed
Simplicity is more important than features

Use Parquet When:

Building a data warehouse or data lake
Performing analytical queries on large datasets
You need optimal query performance
Long-term storage with efficient compression is required
Working with Apache Spark, Athena, or similar big data tools

Quick Configuration Examples

JSON Array Format

{
  "format_config": {
    "format": "json",
    "json": {
      "type": "array"
    }
  }
}

CSV Format

{
  "format_config": {
    "format": "delimited",
    "delimited": {
      "delimiter": ",",
      "headers": ["id", "name", "value"]
    }
  }
}

Parquet Format

{
  "format_config": {
    "format": "parquet",
    "parquet": {
      "schema": "{\"Tag\": \"name=data\", \"Fields\": [{\"Tag\": \"name=id, type=INT64, repetitiontype=REQUIRED\"}]}"
    }
  }
}

Format-Specific Documentation

For detailed configuration options and advanced usage:

JSON Format - Array, nested, and line-delimited JSON configurations
Delimited Format - CSV, TSV, and custom delimiter configurations
Parquet Format - Schema definition and columnar storage optimization

Best Practices

Consider Your Downstream Systems: Choose formats that your consuming applications can efficiently process.
Balance Readability and Performance: JSON offers readability, Parquet offers performance, and CSV offers compatibility.
Think About Schema Evolution: If your data structure changes frequently, JSON provides the most flexibility.
Compression Matters: All formats support compression, but Parquet has it built-in and typically achieves the best ratios.
Test with Real Data: Performance characteristics can vary based on your specific data patterns and volumes.

Common Patterns

ETL Pipelines

Extract: Use JSON or CSV for initial data extraction
Transform: Process in memory or with streaming tools
Load: Use Parquet for final storage in data warehouses

Real-time Streaming

Use JSON Line format for append-friendly streaming
Enable minimal batching for near real-time delivery
Consider compression trade-offs for latency

Analytics Workloads

Use Parquet with Hive-compatible partitioning
Define schemas that optimize for your query patterns
Leverage columnar benefits for aggregation queries

Need Help?

If you're unsure which format to choose, consider these questions:

What system will consume this data?
How important is human readability?
What's the expected data volume?
Do you need to support complex nested structures?
Will the schema change frequently?

The answers to these questions will guide you toward the optimal format for your use case.

Supported Formats Overview​

Format Comparison​

Performance Characteristics​

Feature Support​

Choosing the Right Format​

Use JSON When:​

Use Delimited (CSV/TSV) When:​

Use Parquet When:​

Quick Configuration Examples​

JSON Array Format​

CSV Format​

Parquet Format​

Format-Specific Documentation​

Best Practices​

Common Patterns​

ETL Pipelines​

Real-time Streaming​

Analytics Workloads​

Need Help?​