Pipeline Status Alert

Compatible with all Monad tiers

Description

This alert monitors pipeline status and triggers when a pipeline has been in a specific status (Erroring or Throttled) for a sustained minimum duration. It helps detect and alert on pipelines experiencing prolonged operational issues that require attention.

The alert evaluates pipelines at regular intervals and generates alerts for each pipeline where the specified status has been sustained for the entire configured time window.

Prerequisites

Active pipelines in your Monad organization
Understanding of the pipeline status values you want to monitor (Erroring or Throttled)

Setup Instructions

Choose the Pipeline Status to monitor (Erroring or Throttled)
Set the Time Window for the minimum sustained duration (5m, 1h, 6h, or 24h)
Select the pipelines to monitor (leave empty to monitor all organization pipelines)

Configuration Options

Settings

Setting	Type	Required	Default	Description
status	string	Yes	-	Pipeline status to monitor: `Erroring` or `Throttled`. The alert triggers when the pipeline sustains this status for the entire time window.
time_window	string	Yes	5m	Minimum duration the status must be sustained. Must be one of: `5m`, `1h`, `6h`, `24h`. The alert only triggers if status remains constant throughout this period.

Status Options

Erroring: Pipeline is experiencing errors
Throttled: Pipeline has been throttled due to backpressure, indicating stream capacity issues.

Time Window Format

The time window must be one of the following supported values:

5m - 5 minutes
1h - 1 hour
6h - 6 hours
24h - 24 hours

Alert JSON Format

When a pipeline sustains the specified status for the configured duration, the alert generates the following JSON structure:

{
  "rule_id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "Pipeline Status Alert",
  "organization_id": "org-123",
  "severity": "critical",
  "description": "Pipeline pipeline-abc-123 has been in Erroring status for at least 5m",
  "metadata": {
    "pipeline_id": "pipeline-abc-123",
    "status": "Erroring",
    "time_window": "5m"
  },
  "resource": {
    "resource_type": "pipeline",
    "resource_id": "pipeline-abc-123"
  }
}

Alert Metadata Fields

pipeline_id: The ID of the pipeline that triggered the alert
status: The pipeline status that was sustained (Erroring or Throttled)
time_window: The time window used to determine sustained status

Use Cases

Error Recovery Monitoring: Alert when pipelines are erroring for extended periods, indicating a problem that needs manual intervention
Throttle Detection: Detect pipelines being throttled for sustained periods that may indicate capacity or resource issues
Operational Awareness: Get notified when critical pipelines are experiencing status issues continuously
Incident Response: Enable faster response to operational problems by alerting when status conditions persist
SLA Compliance: Ensure pipelines stay within acceptable operational status requirements

Limitations

Status must be one of the valid options: Erroring or Throttled
Time window must be one of the supported values: 5m, 1h, 6h, 24h
The alert only fires when the status has been sustained for the entire time window — brief flickers in and out of the target status will not trigger the alert

Example Configurations

Alert on Sustained Errors (5 minutes)

{
  "status": "Erroring",
  "time_window": "5m"
}

Alerts when a pipeline has been in Erroring status continuously for 5 minutes.

Alert on Extended Throttling (1 hour)

{
  "status": "Throttled",
  "time_window": "1h"
}

Alerts when a pipeline has been throttled for at least 1 hour, indicating a sustained capacity issue.

Alert on Long-Running Errors (1 hour)

{
  "status": "Erroring",
  "time_window": "1h"
}

Alerts only for persistent errors lasting a full hour, reducing noise for transient issues.

Description​

Prerequisites​

Setup Instructions​

Configuration Options​

Settings​

Status Options​

Time Window Format​

Alert JSON Format​

Alert Metadata Fields​

Use Cases​

Limitations​

Example Configurations​

Alert on Sustained Errors (5 minutes)​

Alert on Extended Throttling (1 hour)​

Alert on Long-Running Errors (1 hour)​