Pipeline Threshold Alert
Description
This alert monitors pipeline metrics and triggers when a metric value exceeds or falls below a configured threshold over a specified time window. It provides real-time monitoring of key operational metrics such as data ingress/egress rates, record counts, and error counts.
The pipeline threshold alert evaluates metrics at regular intervals and generates alerts for each pipeline that violates the configured threshold condition.
Compatible with all Monad tiers
Prerequisites
- Active pipelines generating metrics in your Monad organization
- Understanding of the metric type you want to monitor (bytes, records, or errors)
Setup Instructions
- Select the Metric Type to specify what kind of metric to monitor (bytes, records, or errors)
- Configure the type-specific options — direction (ingress/egress) for bytes and records, and unit (KB/MB/GB) for bytes
- Set the Threshold value that will trigger the alert
- Choose the Operator to determine the comparison type (
greater_thanorless_than) - Specify the Time Window for metric aggregation
- Select the pipelines to monitor (leave empty to monitor all organization pipelines)
Configuration Options
Metric Type
The alert uses a metric type selector to determine which metric to monitor and what supporting fields are required.
Bytes
Monitors ingress or egress data volume. Threshold is specified in human-readable units.
| Field | Type | Required | Description |
|---|---|---|---|
| direction | string | Yes | ingress or egress |
| threshold | integer | Yes | Threshold value in the specified unit. Must be >= 0. |
| unit | string | Yes | KB, MB, or GB |
Records
Monitors ingress or egress record count.
| Field | Type | Required | Description |
|---|---|---|---|
| direction | string | Yes | ingress or egress |
| threshold | integer | Yes | Threshold record count. Must be >= 0. |
Errors
Monitors total error count.
| Field | Type | Required | Description |
|---|---|---|---|
| threshold | integer | Yes | Threshold error count. Must be >= 0. |
Operator
greater_than: Alert when metric value > thresholdless_than: Alert when metric value < threshold
Time Window
| Value | Description |
|---|---|
5m | Last 5 minutes |
1h | Last 1 hour |
6h | Last 6 hours |
24h | Last 24 hours |
Configuration Examples
Alert when ingress bytes exceed 500 MB in the last 5 minutes:
{
"type": "threshold-alert",
"settings": {
"metric_config": {
"type": "bytes",
"bytes": {
"direction": "ingress",
"threshold": 500,
"unit": "MB"
}
},
"operator": "greater_than",
"time_window": "5m"
}
}
Alert when egress bytes drop below 1 MB in the last hour:
{
"type": "threshold-alert",
"settings": {
"metric_config": {
"type": "bytes",
"bytes": {
"direction": "egress",
"threshold": 1,
"unit": "MB"
}
},
"operator": "less_than",
"time_window": "1h"
}
}
Alert when error count exceeds 10 in the last 5 minutes:
{
"type": "threshold-alert",
"settings": {
"metric_config": {
"type": "errors",
"errors": {
"threshold": 10
}
},
"operator": "greater_than",
"time_window": "5m"
}
}
Alert JSON Format
When the threshold condition is met, the alert generates the following JSON structure:
{
"rule_id": "550e8400-e29b-41d4-a716-446655440000",
"name": "High Ingress Rate Alert",
"organization_id": "org-123",
"severity": "critical",
"description": "Pipeline pipeline-abc-123 greater than threshold of 524288000 for metric 'ingress_bytes' (current value: 549755813.00)",
"metadata": {
"pipeline_id": "pipeline-abc-123",
"value": 549755813.0,
"threshold": 524288000,
"operator": "greater_than",
"metric_name": "ingress_bytes",
"time_window": "5m"
},
"resource": {
"resource_type": "pipeline",
"resource_id": "pipeline-abc-123"
}
}
Note: The
thresholdin alert metadata is always in base units (bytes for bytes metric, count for records/errors). For example, a configured threshold of 500 MB is stored and reported as524288000bytes.
Alert Metadata Fields
- pipeline_id: The ID of the pipeline that triggered the alert
- value: The current metric value that triggered the alert (in base units)
- threshold: The configured threshold value (in base units)
- operator: The operator used for comparison
- metric_name: The underlying metric queried (e.g.,
ingress_bytes,egress_records,errors) - time_window: The time window used for metric aggregation
Use Cases
- Capacity Monitoring: Alert when data ingress exceeds processing capacity
- Silent Stall Detection: Detect drops in egress that indicate a pipeline has stopped delivering data
- Error Spike Detection: Trigger alerts when error counts exceed acceptable levels
- SLA Compliance: Ensure pipeline metrics stay within defined service level agreements
- Cost Management: Monitor data volumes to prevent unexpected billing spikes
Limitations
- Threshold values must be non-negative integers
- Time window must be one of the supported values:
5m,1h,6h,24h - Metrics availability depends on pipeline activity and data retention policies