Pipeline Error Rate Alert

Description

This alert monitors the error rate of pipelines by calculating the ratio of errors to ingested records over a specified time window. It triggers when the error rate exceeds a configured percentage threshold, helping detect data quality issues and pipeline degradation early.

The error rate is calculated as: (errors / ingested_records) * 100

The alert evaluates pipelines at regular intervals and generates alerts for each pipeline that exceeds the error rate threshold. It includes a configurable minimum records threshold to prevent false positives on low-volume pipelines.

Compatible with all Monad tiers

Prerequisites

Active pipelines generating metrics in your Monad organization
Pipelines processing data (ingesting records)
Pipelines generating error events (error tracking is automatic — no manual configuration required)

Setup Instructions

Set the Error Rate Threshold as a percentage (e.g., 5.0 for 5%)
Specify the Time Window for metric aggregation (5m, 1h, 6h, or 24h)
Configure the Minimum Records threshold to prevent alerts on low-volume pipelines (defaults to 100)
Select the pipelines to monitor (leave empty to monitor all organization pipelines)

Configuration Options

Settings

Setting	Type	Required	Default	Description
threshold	float	Yes	-	Error rate percentage threshold (e.g., 5.0 for 5%). Alert triggers when error rate exceeds this value. Must be between 0 and 100.
time_window	string	Yes	-	Time window for metric aggregation. Must be one of: `5m`, `1h`, `6h`, `24h`.
min_records	integer	No	100	Minimum number of ingested records required to evaluate the error rate. Pipelines below this threshold are skipped to prevent false positives on low-volume data.

Time Window Format

The time window must be one of the following supported values:

5m - 5 minutes
1h - 1 hour
6h - 6 hours
24h - 24 hours

Alert JSON Format

When the error rate exceeds the threshold, the alert generates the following JSON structure:

{
  "rule_id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "High Error Rate Alert",
  "organization_id": "org-123",
  "severity": "critical",
  "description": "Pipeline pipeline-abc-123 error rate 7.50% exceeds threshold of 5.00%",
  "metadata": {
    "pipeline_id": "pipeline-abc-123",
    "error_rate": 7.5,
    "error_count": 75,
    "ingested_count": 1000,
    "threshold": 5.0,
    "time_window": "5m",
    "min_records": 100
  },
  "resource": {
    "resource_type": "pipeline",
    "resource_id": "pipeline-abc-123"
  }
}

Alert Metadata Fields

pipeline_id: The ID of the pipeline that triggered the alert
error_rate: The calculated error rate as a percentage
error_count: The total number of errors in the time window
ingested_count: The total number of ingested records in the time window
threshold: The configured error rate threshold percentage
time_window: The time window used for metric aggregation
min_records: The minimum records threshold configured

Use Cases

Data Quality Monitoring: Detect degradation in data quality when error rates spike
Pipeline Health: Monitor pipeline reliability and catch processing issues early
SLA Compliance: Ensure error rates stay within acceptable service level agreements
Anomaly Detection: Identify unusual patterns that might indicate upstream data issues or configuration problems
Production Monitoring: Alert on-call teams when pipelines are experiencing elevated error rates
Low-Volume Protection: Use min_records to avoid noisy alerts on pipelines with sporadic or minimal traffic

Limitations

Threshold must be between 0 and 100 (percentage)
Time window must be one of the supported values: 5m, 1h, 6h, 24h
Requires both error and ingestion metrics to be available
Pipelines with ingested record count below min_records are skipped (no alert generated)
Error rate calculation requires at least some ingested records (ingested_count > 0)

Example Configurations

High Sensitivity for Critical Pipelines

{
  "threshold": 1.0,
  "time_window": "5m",
  "min_records": 50
}

Alerts when error rate exceeds 1% over 5 minutes, with a low minimum records threshold for quick detection.

Standard Production Monitoring

{
  "threshold": 5.0,
  "time_window": "1h",
  "min_records": 100
}

Alerts when error rate exceeds 5% over 1 hour, filtering out low-volume pipelines.

Low-Volume Pipeline Monitoring

{
  "threshold": 10.0,
  "time_window": "1h",
  "min_records": 10
}

Alerts when error rate exceeds 10% over 1 hour, suitable for pipelines with lower traffic volumes.

Description​

Prerequisites​

Setup Instructions​

Configuration Options​

Settings​

Time Window Format​

Alert JSON Format​

Alert Metadata Fields​

Use Cases​

Limitations​

Example Configurations​

High Sensitivity for Critical Pipelines​

Standard Production Monitoring​

Low-Volume Pipeline Monitoring​