Pipeline Threshold Alert

Description

This alert monitors pipeline metrics and triggers when a metric value exceeds or falls below a configured threshold over a specified time window. It provides real-time monitoring of key operational metrics such as data ingress/egress rates, record counts, and error counts.

The pipeline threshold alert evaluates metrics at regular intervals and generates alerts for each pipeline that violates the configured threshold condition.

Compatible with all Monad tiers

Prerequisites

Active pipelines generating metrics in your Monad organization
Understanding of the metric type you want to monitor (bytes, records, or errors)

Setup Instructions

Select the Metric Type to specify what kind of metric to monitor (bytes, records, or errors)
Configure the type-specific options — direction (ingress/egress) for bytes and records, and unit (KB/MB/GB) for bytes
Set the Threshold value that will trigger the alert
Choose the Operator to determine the comparison type (greater_than or less_than)
Specify the Time Window for metric aggregation
Select the pipelines to monitor (leave empty to monitor all organization pipelines)

Configuration Options

Metric Type

The alert uses a metric type selector to determine which metric to monitor and what supporting fields are required.

Bytes

Monitors ingress or egress data volume. Threshold is specified in human-readable units.

Field	Type	Required	Description
direction	string	Yes	`ingress` or `egress`
threshold	integer	Yes	Threshold value in the specified unit. Must be >= 0.
unit	string	Yes	`KB`, `MB`, or `GB`

Records

Monitors ingress or egress record count.

Field	Type	Required	Description
direction	string	Yes	`ingress` or `egress`
threshold	integer	Yes	Threshold record count. Must be >= 0.

Errors

Monitors total error count.

Field	Type	Required	Description
threshold	integer	Yes	Threshold error count. Must be >= 0.

Operator

greater_than: Alert when metric value > threshold
less_than: Alert when metric value < threshold

Time Window

Value	Description
`5m`	Last 5 minutes
`1h`	Last 1 hour
`6h`	Last 6 hours
`24h`	Last 24 hours

Configuration Examples

Alert when ingress bytes exceed 500 MB in the last 5 minutes:

{
  "metric_config": {
    "type": "bytes",
    "bytes": {
      "direction": "ingress",
      "threshold": 500,
      "unit": "MB"
    }
  },
  "operator": "greater_than",
  "time_window": "5m"
}

Alert when egress bytes drop below 1 MB in the last hour:

{
  "metric_config": {
    "type": "bytes",
    "bytes": {
      "direction": "egress",
      "threshold": 1,
      "unit": "MB"
    }
  },
  "operator": "less_than",
  "time_window": "1h"
}

Alert when error count exceeds 10 in the last 5 minutes:

{
  "metric_config": {
    "type": "errors",
    "errors": {
      "threshold": 10
    }
  },
  "operator": "greater_than",
  "time_window": "5m"
}

Alert JSON Format

When the threshold condition is met, the alert generates the following JSON structure:

{
  "rule_id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "High Ingress Rate Alert",
  "organization_id": "org-123",
  "severity": "critical",
  "description": "Pipeline pipeline-abc-123 greater than threshold of 524288000 for metric 'ingress_bytes' (current value: 549755813.00)",
  "metadata": {
    "pipeline_id": "pipeline-abc-123",
    "value": 549755813.0,
    "threshold": 524288000,
    "operator": "greater_than",
    "metric_name": "ingress_bytes",
    "time_window": "5m"
  },
  "resource": {
    "resource_type": "pipeline",
    "resource_id": "pipeline-abc-123"
  }
}

Note: The threshold in alert metadata is always in base units (bytes for bytes metric, count for records/errors). For example, a configured threshold of 500 MB is stored and reported as 524288000 bytes.

Alert Metadata Fields

pipeline_id: The ID of the pipeline that triggered the alert
value: The current metric value that triggered the alert (in base units)
threshold: The configured threshold value (in base units)
operator: The operator used for comparison
metric_name: The underlying metric queried (e.g., ingress_bytes, egress_records, errors)
time_window: The time window used for metric aggregation

Use Cases

Capacity Monitoring: Alert when data ingress exceeds processing capacity
Silent Stall Detection: Detect drops in egress that indicate a pipeline has stopped delivering data
Error Spike Detection: Trigger alerts when error counts exceed acceptable levels
SLA Compliance: Ensure pipeline metrics stay within defined service level agreements
Cost Management: Monitor data volumes to prevent unexpected billing spikes

Limitations

Threshold values must be non-negative integers
Time window must be one of the supported values: 5m, 1h, 6h, 24h
Metrics availability depends on pipeline activity and data retention policies

Description​

Prerequisites​

Setup Instructions​

Configuration Options​

Metric Type​

Bytes​

Records​

Errors​

Operator​

Time Window​

Configuration Examples​

Alert JSON Format​

Alert Metadata Fields​

Use Cases​

Limitations​