Inputs
Ingests security data from upstream sources like cloud audit logs, vulnerability scanners, identity providers, and EDR platforms.
Key Features
- Seamless integration with multiple data sources
- Automated authentication and data retrieval
- Initial formatting for smooth pipeline integration
Each input connector is designed to interface with a specific data source. These connectors handle authentication, data retrieval, and initial formatting to ensure smooth integration with our data processing pipeline.
What You'll Find
In the following pages, you'll find:
- Configuration instructions for each input connector
- Supported data types and formats
- Authentication methods and requirements
Concept: Backfilling data from Inputs
The Backfill Start Time setting allows you to control the historical starting point for data collection. This determines how far back in time the system will fetch data during the initial sync.
How It Works
Initial Sync Behavior
When you first configure a data source:
- With Backfill Start Time: The system fetches all data from the specified date forward to the present
- Without Backfill Start Time: The system performs a full sync, collecting all available historical data up to now
Subsequent Syncs
After the initial sync completes:
- All future syncs are incremental, collecting only new data since the last successful sync
- The Backfill Start Time setting is no longer used
- The system automatically tracks its position and continues from where it left off
Configuration
Note: Backfill start time may not exist in every single input connector. Please review the docs of the specific input to verify.
Format
The Backfill Start Time must be provided in ISO 8601 format:
YYYY-MM-DDTHH:MM:SSZ
Examples:
2024-01-01T00:00:00Z- Start of January 1st, 2024 (UTC)2024-06-15T14:30:00Z- June 15th, 2024 at 2:30 PM (UTC)
When to Use
Use Backfill Start Time when:
- You only need recent data and want to avoid a lengthy initial sync
- You're testing the integration and want to limit data volume
- Compliance requirements only mandate retention from a specific date
Skip Backfill Start Time when:
- You need complete historical data
- The data source doesn't have extensive history
- You're unsure of the appropriate starting point
Examples
Scenario 1: Recent Data Only
You only need the last 90 days of audit logs:
- Set Backfill Start Time to 90 days ago
- Initial sync fetches 90 days of history
- Future syncs continue incrementally from the present
Scenario 2: Compliance Requirement
Your organization requires 1 year of audit data:
- Set Backfill Start Time to exactly 1 year ago
- Initial sync fetches 1 year of history
- Future syncs maintain rolling coverage going forward
Scenario 3: Full Historical Sync
You need all available data:
- Leave Backfill Start Time empty
- Initial sync fetches all available historical data
- This may take longer depending on data volume
Important Notes
- One-Time Setting: Backfill Start Time only affects the first sync; changing it later has no effect
- Time Zone: Always use UTC timezone in the ISO 8601 format
- Validation: The system validates the format before starting the sync
- No Data Loss: Incremental syncs ensure continuous coverage after the initial backfill completes
Concept: Data Duplication and Loss
Our data ingestion system is designed to guarantee no data loss while minimizing duplication. State is saved after processing each page of results, enabling safe recovery from any failure scenario. Please read the specific connector's docs for any caveats related to that input data source.
Guarantees
- Zero Data Loss: Records are never skipped, even during system failures or interruptions
- Minimal Duplication: In worst-case scenarios, only the current page being processed may be duplicated
- Automatic Recovery: The system automatically resumes from the last successful checkpoint
How It Works
Checkpoint-Based Processing
Data is processed in pages with automatic checkpointing:
- A page of records is fetched from the source API
- Each record in the page is processed and sent downstream
- After the entire page completes successfully, a checkpoint is saved
- Processing continues with the next page
If anything goes wrong, the system restarts from the last saved checkpoint.
Failure Scenarios
Normal Operation
When everything works correctly:
- Each page is processed once
- State is saved after each page
- No duplication occurs
Recovery from Failure
When a failure occurs during processing:
- The system resumes from the last saved checkpoint
- Records from the incomplete page are reprocessed
- Previously completed pages are never reprocessed
Result: At most one page of data may be duplicated
Why Data Is Never Lost
Because state is saved after each page completes:
- Completed pages are always checkpointed before moving forward
- Failed pages are automatically retried on restart
- The system never advances past data it hasn't successfully processed
Note: Please note this documented behavior may differ for specific inputs, and please ensure you review the input's specific documentation to correctly understand intended behavior.
Design Principles
The system is built on these core principles:
- Conservative Processing: When in doubt, the system prefers to duplicate rather than risk losing data
- Page-Level Granularity: Checkpoints happen per-page to balance performance with recovery precision
- Idempotent Design: Downstream systems should be prepared to handle duplicate records gracefully
- Automatic Recovery: No manual intervention is required after failures
Expected Behavior
Under normal conditions, you should see:
- Zero duplicates during steady-state operation
- Minimal duplicates (one page maximum) after system restarts or failures
- Complete data coverage with no gaps in the timeline
This makes the system suitable for compliance, audit logging, and other use cases where data completeness is critical.