Skip to main content

Volume Anomaly Detection Alert

Description

This alert detects volume anomalies in pipeline metrics using IQR-based (Interquartile Range) statistical methods on historical patterns. It uses the Tukey fence method to identify outliers by comparing current metric values against historically established bounds.

The anomaly detection calculates bounds using: [Q1 - k×IQR, Q3 + k×IQR] where IQR = Q3 - Q1. Values outside these bounds are flagged as anomalies.

The alert includes a bootstrap period where pipelines must accumulate at least 4 weeks of historical data before anomaly detection is enabled. This ensures statistical reliability by preventing false positives on insufficient data.

Compatible with all Monad tiers

Prerequisites

  1. Active pipelines generating metrics in your Monad organization
  2. At least 4 weeks of historical metric data for each pipeline (bootstrap period)
  3. Understanding of IQR-based anomaly detection concepts

Setup Instructions

  1. Select the Metric to Monitor (ingress_bytes or egress_bytes)
  2. Set the IQR K-Factor to control sensitivity (common values: 1.5 for outliers, 2.5 for far outliers, 3.0 for extreme outliers)
  3. Select the pipelines to monitor (leave empty to monitor all organization pipelines)

Configuration Options

Settings

SettingTypeRequiredDefaultDescription
metricstringYes-The metric to monitor for anomalies. Must be ingress_bytes or egress_bytes.
iqr_k_factorfloatYes-Sensitivity multiplier for anomaly detection. Values outside the range [Q1 - k×IQR, Q3 + k×IQR] trigger alerts. Use 1.5 for outliers, 2.5 for far outliers, or 3.0 for extreme outliers only.

IQR K-Factor Guidelines

The k-factor determines how sensitive the anomaly detection is:

  • 1.5 - Standard outlier detection (more alerts, catches subtle anomalies)
  • 2.5 - Far outlier detection (balanced sensitivity)
  • 3.0 - Extreme outlier detection (fewer alerts, only major deviations)

Confidence Levels

The alert assigns confidence levels based on the amount of historical data available:

Data AvailableConfidence LevelBehavior
< 4 weeksBootstrapNo alerting (insufficient data)
4-8 weeksLowAlerting enabled with lower confidence
8-20 weeksMediumAlerting with moderate confidence
20+ weeksHighAlerting with high confidence

Alert JSON Format

When an anomaly is detected, the alert generates the following JSON structure:

{
"rule_id": "550e8400-e29b-41d4-a716-446655440000",
"name": "Volume Anomaly Alert",
"organization_id": "org-123",
"severity": "warning",
"description": "Pipeline pipeline-abc-123: ingress_bytes is above normal range (current: 5000000, above bound: 3500000)",
"metadata": {
"pipeline_id": "pipeline-abc-123",
"metric_name": "ingress_bytes",
"detection_method": "iqr",
"current_value": 5000000,
"lower_bound": 500000,
"upper_bound": 3500000,
"threshold_value": 2.5,
"sample_size": 12,
"confidence_level": "medium"
},
"resource": {
"resource_type": "pipeline",
"resource_id": "pipeline-abc-123"
}
}

Alert Metadata Fields

  • pipeline_id: The ID of the pipeline that triggered the alert
  • metric_name: The metric being monitored (ingress_bytes or egress_bytes)
  • detection_method: The algorithm used for detection (iqr for IQR-based detection)
  • current_value: The current metric value that triggered the anomaly
  • lower_bound: The lower threshold (Q1 - k×IQR)
  • upper_bound: The upper threshold (Q3 + k×IQR)
  • threshold_value: The k-factor used for the Tukey fence calculation
  • sample_size: Number of weekly samples used for statistical calculation
  • confidence_level: Reliability of the detection based on historical data (low, medium, or high)

Use Cases

  • Capacity Planning: Detect unusual spikes in data volume that may indicate capacity issues
  • Data Pipeline Health: Identify unexpected drops in data flow that might signal upstream problems
  • Anomaly Investigation: Catch volume changes that deviate significantly from historical patterns
  • Cost Management: Alert on unusual data volumes that could impact billing
  • Security Monitoring: Detect abnormal data exfiltration patterns through volume anomalies
  • SLA Compliance: Ensure data volumes stay within expected operational ranges

Limitations

  • Requires a minimum of 4 weeks of historical data before alerting begins (bootstrap period)
  • Only supports ingress_bytes and egress_bytes metrics
  • IQR k-factor must be greater than 0
  • Confidence level is determined by data availability and cannot be manually configured
  • For metrics with very consistent historical values (near-zero IQR), a small absolute tolerance is used instead
  • Lower bounds are automatically clamped to 0 for byte metrics (cannot be negative)

Example Configurations

Standard Volume Monitoring

{
"metric": "ingress_bytes",
"iqr_k_factor": 2.5
}

Detects far outliers in ingress data volume, suitable for most production use cases.

High Sensitivity Monitoring

{
"metric": "egress_bytes",
"iqr_k_factor": 1.5
}

Catches subtle anomalies in egress data volume. Use for critical pipelines where early detection is important.

Low Sensitivity Monitoring

{
"metric": "ingress_bytes",
"iqr_k_factor": 3.0
}

Only alerts on extreme deviations. Use for pipelines with naturally variable volumes to reduce alert noise.