Skip to main content

Pipeline Threshold Alert

Description

This alert monitors pipeline metrics and triggers when a metric value exceeds or falls below a configured threshold over a specified time window. It provides real-time monitoring of key operational metrics such as data ingress/egress rates, record counts, and error counts.

The pipeline threshold alert evaluates metrics at regular intervals and generates alerts for each pipeline that violates the configured threshold condition.

Compatible with all Monad tiers

Prerequisites

  1. Active pipelines generating metrics in your Monad organization
  2. Understanding of the metric type you want to monitor (bytes, records, or errors)

Setup Instructions

  1. Select the Metric Type to specify what kind of metric to monitor (bytes, records, or errors)
  2. Configure the type-specific options — direction (ingress/egress) for bytes and records, and unit (KB/MB/GB) for bytes
  3. Set the Threshold value that will trigger the alert
  4. Choose the Operator to determine the comparison type (greater_than or less_than)
  5. Specify the Time Window for metric aggregation
  6. Select the pipelines to monitor (leave empty to monitor all organization pipelines)

Configuration Options

Metric Type

The alert uses a metric type selector to determine which metric to monitor and what supporting fields are required.

Bytes

Monitors ingress or egress data volume. Threshold is specified in human-readable units.

FieldTypeRequiredDescription
directionstringYesingress or egress
thresholdintegerYesThreshold value in the specified unit. Must be >= 0.
unitstringYesKB, MB, or GB

Records

Monitors ingress or egress record count.

FieldTypeRequiredDescription
directionstringYesingress or egress
thresholdintegerYesThreshold record count. Must be >= 0.

Errors

Monitors total error count.

FieldTypeRequiredDescription
thresholdintegerYesThreshold error count. Must be >= 0.

Operator

  • greater_than: Alert when metric value > threshold
  • less_than: Alert when metric value < threshold

Time Window

ValueDescription
5mLast 5 minutes
1hLast 1 hour
6hLast 6 hours
24hLast 24 hours

Configuration Examples

Alert when ingress bytes exceed 500 MB in the last 5 minutes:

{
"type": "threshold-alert",
"settings": {
"metric_config": {
"type": "bytes",
"bytes": {
"direction": "ingress",
"threshold": 500,
"unit": "MB"
}
},
"operator": "greater_than",
"time_window": "5m"
}
}

Alert when egress bytes drop below 1 MB in the last hour:

{
"type": "threshold-alert",
"settings": {
"metric_config": {
"type": "bytes",
"bytes": {
"direction": "egress",
"threshold": 1,
"unit": "MB"
}
},
"operator": "less_than",
"time_window": "1h"
}
}

Alert when error count exceeds 10 in the last 5 minutes:

{
"type": "threshold-alert",
"settings": {
"metric_config": {
"type": "errors",
"errors": {
"threshold": 10
}
},
"operator": "greater_than",
"time_window": "5m"
}
}

Alert JSON Format

When the threshold condition is met, the alert generates the following JSON structure:

{
"rule_id": "550e8400-e29b-41d4-a716-446655440000",
"name": "High Ingress Rate Alert",
"organization_id": "org-123",
"severity": "critical",
"description": "Pipeline pipeline-abc-123 greater than threshold of 524288000 for metric 'ingress_bytes' (current value: 549755813.00)",
"metadata": {
"pipeline_id": "pipeline-abc-123",
"value": 549755813.0,
"threshold": 524288000,
"operator": "greater_than",
"metric_name": "ingress_bytes",
"time_window": "5m"
},
"resource": {
"resource_type": "pipeline",
"resource_id": "pipeline-abc-123"
}
}

Note: The threshold in alert metadata is always in base units (bytes for bytes metric, count for records/errors). For example, a configured threshold of 500 MB is stored and reported as 524288000 bytes.

Alert Metadata Fields

  • pipeline_id: The ID of the pipeline that triggered the alert
  • value: The current metric value that triggered the alert (in base units)
  • threshold: The configured threshold value (in base units)
  • operator: The operator used for comparison
  • metric_name: The underlying metric queried (e.g., ingress_bytes, egress_records, errors)
  • time_window: The time window used for metric aggregation

Use Cases

  • Capacity Monitoring: Alert when data ingress exceeds processing capacity
  • Silent Stall Detection: Detect drops in egress that indicate a pipeline has stopped delivering data
  • Error Spike Detection: Trigger alerts when error counts exceed acceptable levels
  • SLA Compliance: Ensure pipeline metrics stay within defined service level agreements
  • Cost Management: Monitor data volumes to prevent unexpected billing spikes

Limitations

  • Threshold values must be non-negative integers
  • Time window must be one of the supported values: 5m, 1h, 6h, 24h
  • Metrics availability depends on pipeline activity and data retention policies