Skip to main content

Inputs

Ingests security data from upstream sources like cloud audit logs, vulnerability scanners, identity providers, and EDR platforms.

Key Features

  • Seamless integration with multiple data sources
  • Automated authentication and data retrieval
  • Initial formatting for smooth pipeline integration

Each input connector is designed to interface with a specific data source. These connectors handle authentication, data retrieval, and initial formatting to ensure smooth integration with our data processing pipeline.

What You'll Find

In the following pages, you'll find:

  • Configuration instructions for each input connector
  • Supported data types and formats
  • Authentication methods and requirements

Concept: Backfilling data from Inputs

The Backfill Start Time setting allows you to control the historical starting point for data collection. This determines how far back in time the system will fetch data during the initial sync.

How It Works

Initial Sync Behavior

When you first configure a data source:

  • With Backfill Start Time: The system fetches all data from the specified date forward to the present
  • Without Backfill Start Time: The system performs a full sync, collecting all available historical data up to now

Subsequent Syncs

After the initial sync completes:

  • All future syncs are incremental, collecting only new data since the last successful sync
  • The Backfill Start Time setting is no longer used
  • The system automatically tracks its position and continues from where it left off

Configuration

Note: Backfill start time may not exist in every single input connector. Please review the docs of the specific input to verify.

Format

The Backfill Start Time must be provided in ISO 8601 format:

YYYY-MM-DDTHH:MM:SSZ

Examples:

  • 2024-01-01T00:00:00Z - Start of January 1st, 2024 (UTC)
  • 2024-06-15T14:30:00Z - June 15th, 2024 at 2:30 PM (UTC)

When to Use

Use Backfill Start Time when:

  • You only need recent data and want to avoid a lengthy initial sync
  • You're testing the integration and want to limit data volume
  • Compliance requirements only mandate retention from a specific date

Skip Backfill Start Time when:

  • You need complete historical data
  • The data source doesn't have extensive history
  • You're unsure of the appropriate starting point

Examples

Scenario 1: Recent Data Only

You only need the last 90 days of audit logs:

  • Set Backfill Start Time to 90 days ago
  • Initial sync fetches 90 days of history
  • Future syncs continue incrementally from the present

Scenario 2: Compliance Requirement

Your organization requires 1 year of audit data:

  • Set Backfill Start Time to exactly 1 year ago
  • Initial sync fetches 1 year of history
  • Future syncs maintain rolling coverage going forward

Scenario 3: Full Historical Sync

You need all available data:

  • Leave Backfill Start Time empty
  • Initial sync fetches all available historical data
  • This may take longer depending on data volume

Important Notes

  • One-Time Setting: Backfill Start Time only affects the first sync; changing it later has no effect
  • Time Zone: Always use UTC timezone in the ISO 8601 format
  • Validation: The system validates the format before starting the sync
  • No Data Loss: Incremental syncs ensure continuous coverage after the initial backfill completes

Concept: Data Duplication and Loss

Our data ingestion system is designed to guarantee no data loss while minimizing duplication. State is saved after processing each page of results, enabling safe recovery from any failure scenario. Please read the specific connector's docs for any caveats related to that input data source.

Guarantees

  • Zero Data Loss: Records are never skipped, even during system failures or interruptions
  • Minimal Duplication: In worst-case scenarios, only the current page being processed may be duplicated
  • Automatic Recovery: The system automatically resumes from the last successful checkpoint

How It Works

Checkpoint-Based Processing

Data is processed in pages with automatic checkpointing:

  1. A page of records is fetched from the source API
  2. Each record in the page is processed and sent downstream
  3. After the entire page completes successfully, a checkpoint is saved
  4. Processing continues with the next page

If anything goes wrong, the system restarts from the last saved checkpoint.

Failure Scenarios

Normal Operation

When everything works correctly:

  • Each page is processed once
  • State is saved after each page
  • No duplication occurs

Recovery from Failure

When a failure occurs during processing:

  • The system resumes from the last saved checkpoint
  • Records from the incomplete page are reprocessed
  • Previously completed pages are never reprocessed

Result: At most one page of data may be duplicated

Why Data Is Never Lost

Because state is saved after each page completes:

  • Completed pages are always checkpointed before moving forward
  • Failed pages are automatically retried on restart
  • The system never advances past data it hasn't successfully processed

Note: Please note this documented behavior may differ for specific inputs, and please ensure you review the input's specific documentation to correctly understand intended behavior.

Design Principles

The system is built on these core principles:

  1. Conservative Processing: When in doubt, the system prefers to duplicate rather than risk losing data
  2. Page-Level Granularity: Checkpoints happen per-page to balance performance with recovery precision
  3. Idempotent Design: Downstream systems should be prepared to handle duplicate records gracefully
  4. Automatic Recovery: No manual intervention is required after failures

Expected Behavior

Under normal conditions, you should see:

  • Zero duplicates during steady-state operation
  • Minimal duplicates (one page maximum) after system restarts or failures
  • Complete data coverage with no gaps in the timeline

This makes the system suitable for compliance, audit logging, and other use cases where data completeness is critical.