AWS CloudWatch + PagerDuty
Integrate AWS CloudWatch with PagerDuty to Automate Incident Response
Turn CloudWatch alarms into PagerDuty incidents instantly, so your on-call team is always the first to know.


Why integrate AWS CloudWatch and PagerDuty?
AWS CloudWatch and PagerDuty do two different jobs that only work when they're connected. CloudWatch watches your AWS infrastructure continuously, tracking metrics, logs, and alarms across EC2, Lambda, RDS, and dozens of other services. PagerDuty makes sure the right engineers are notified and moving the moment something breaks. Without a connection between them, there's a gap between detection and response where incidents go unnoticed and resolution time climbs. Connecting these platforms through tray.ai closes that gap — an automated pipeline from anomaly to alert to fix.
Automate & integrate AWS CloudWatch & PagerDuty
Use case
CloudWatch Alarm to PagerDuty Incident Creation
When a CloudWatch alarm transitions to ALARM state — whether from high CPU utilization, memory pressure, or network anomalies — tray.ai opens a new PagerDuty incident and routes it to the appropriate service and escalation policy. The incident is populated with alarm metadata including the affected resource, metric value, and breached threshold, giving on-call engineers immediate context without needing to log into the AWS console.
Use case
Auto-Resolve PagerDuty Incidents When CloudWatch Returns to OK
When a CloudWatch alarm recovers and transitions back to OK state, tray.ai automatically resolves the corresponding PagerDuty incident. Stale alerts stop cluttering dashboards and keeping engineers unnecessarily on edge. PagerDuty always reflects the true health of your AWS infrastructure in real time.
Use case
Severity-Based Incident Routing from CloudWatch Metrics
A Lambda timeout deserves a different response than a full RDS database failure. With tray.ai, you can define conditional logic that maps CloudWatch alarm severity, namespace, or resource type to specific PagerDuty services, urgency levels, and escalation policies. Critical P1 incidents immediately page senior engineers while informational warnings are quietly logged for review.
Use case
CloudWatch Log Insights Anomaly Alerting via PagerDuty
CloudWatch Log Insights can surface error spikes, unusual patterns, and application-level failures buried in log streams. By connecting with PagerDuty through tray.ai, teams can trigger incidents when log-based metric filters breach thresholds — a sudden surge in 5xx errors or repeated authentication failures, for example — so application-layer issues get the same incident management treatment as infrastructure alarms.
Use case
Scheduled AWS Health and Budget Alarm Summaries to PagerDuty
Beyond real-time alerting, tray.ai can run scheduled workflows that query CloudWatch for metric trends, billing anomalies, or AWS Health events and push summarized reports as low-urgency PagerDuty incidents or status updates. Operations teams get visibility into slow-burning issues — gradually increasing error rates, cost overruns — before they hit critical thresholds.
Use case
Multi-Region CloudWatch Alarm Aggregation into Unified PagerDuty Incidents
Organizations running workloads across multiple AWS regions often end up with the same underlying issue triggering dozens of redundant alarms. tray.ai can aggregate correlated CloudWatch alarms from multiple regions into a single, deduplicated PagerDuty incident, cutting the noise and helping on-call engineers find the root cause without sifting through hundreds of duplicate notifications.
Use case
Post-Incident CloudWatch Metric Snapshots Attached to PagerDuty
Once a PagerDuty incident is resolved, tray.ai can automatically retrieve historical CloudWatch metric data from the incident window and attach it as notes or links within the PagerDuty incident timeline. Every incident becomes a self-documenting record with the exact metric graphs and log data from the failure period, so post-mortems move faster without engineers manually pulling reports after the fact.
Get started with AWS CloudWatch & PagerDuty integration today
AWS CloudWatch & PagerDuty Challenges
What challenges are there when working with AWS CloudWatch & PagerDuty and how will using Tray.ai help?
Challenge
Alarm State Transitions Generating Duplicate or Redundant Incidents
CloudWatch alarms frequently flap between ALARM and OK states during intermittent issues, flooding PagerDuty with duplicate incident create and resolve events that exhaust on-call engineers and erode trust in the alerting system.
How Tray.ai Can Help:
tray.ai workflows implement deduplication logic using PagerDuty's dedup_key field and state-tracking within the workflow itself, so a flapping alarm maps to a single incident lifecycle rather than generating a flood of redundant notifications.
Challenge
Mapping AWS Resource Context to Actionable PagerDuty Incidents
Raw CloudWatch alarm payloads contain AWS-specific identifiers like ARNs, metric namespaces, and dimension keys that mean something to AWS engineers but leave on-call responders without the plain-language context they need to act quickly.
How Tray.ai Can Help:
tray.ai's data transformation tools let teams parse and enrich CloudWatch payloads — translating resource ARNs into human-readable names, appending runbook links, and formatting metric data into clear incident summaries — before anything reaches PagerDuty.
Challenge
Routing Alarms from Multiple AWS Accounts and Regions
Enterprises running across multiple AWS accounts and regions face a real headache consolidating CloudWatch alarms from fragmented infrastructure into a coherent PagerDuty incident structure without building and maintaining custom routing logic in every account.
How Tray.ai Can Help:
tray.ai acts as a centralized integration layer that receives alarm events from all AWS accounts and regions via a shared SNS endpoint, applies unified routing logic, and maps alarms to the correct PagerDuty services and teams — no per-account Lambda functions required.
Challenge
Keeping PagerDuty Incident State in Sync with CloudWatch Alarm Lifecycle
Without automated lifecycle management, PagerDuty incidents stay open long after a CloudWatch alarm has recovered, creating stale incident backlogs that mislead teams about actual infrastructure health and make on-call handoffs messy.
How Tray.ai Can Help:
tray.ai workflows handle the full alarm lifecycle — creating incidents on ALARM transitions, acknowledging them when engineers accept the alert, and resolving them automatically when CloudWatch returns to OK — so PagerDuty stays synchronized with live AWS infrastructure state.
Challenge
Alert Fatigue from High-Volume CloudWatch Metric Alarms
Large AWS environments with hundreds of monitored resources can generate enormous volumes of CloudWatch alarms, many representing expected transient conditions. Routing all of them directly to PagerDuty overwhelms on-call teams and causes critical alerts to get buried.
How Tray.ai Can Help:
tray.ai filters and aggregates before incidents reach PagerDuty — suppressing known transient alarms, grouping related metric failures into single incidents, and applying time-based rules that reduce off-hours notifications for non-critical services — so engineers only see alerts that actually need a human response.
Start using our pre-built AWS CloudWatch & PagerDuty templates today
Start from scratch or use one of our pre-built AWS CloudWatch & PagerDuty templates to quickly solve your most common use cases.
AWS CloudWatch & PagerDuty Templates
Find pre-built AWS CloudWatch & PagerDuty solutions for common use cases
Template
CloudWatch Alarm → PagerDuty Incident (Real-Time)
Automatically creates a PagerDuty incident whenever a CloudWatch alarm transitions to ALARM state, populating it with the alarm name, affected AWS resource ARN, breached metric value, and a direct link to the CloudWatch console. Resolves the incident automatically when the alarm returns to OK.
Steps:
- Listen for CloudWatch alarm state change events via EventBridge or SNS webhook trigger
- Evaluate alarm state — branch for ALARM, OK, and INSUFFICIENT_DATA transitions
- Create or resolve the corresponding PagerDuty incident with enriched alarm metadata
Connectors Used: AWS CloudWatch, PagerDuty
Template
Severity-Tiered CloudWatch to PagerDuty Routing
Evaluates incoming CloudWatch alarms against a configurable severity matrix and routes them to the appropriate PagerDuty service and urgency level. Critical production alarms trigger high-urgency incidents with immediate escalation, while non-critical alarms create low-urgency incidents without waking on-call staff.
Steps:
- Receive CloudWatch alarm payload and extract namespace, metric name, and alarm description
- Apply conditional logic to classify severity and map to target PagerDuty service and urgency
- Create the PagerDuty incident with severity-appropriate settings and resource context
Connectors Used: AWS CloudWatch, PagerDuty
Template
CloudWatch Log Metric Filter Breach to PagerDuty Alert
Monitors CloudWatch log metric filters for application-level anomalies such as error rate spikes or failed authentication events. When a log-based metric breaches a defined threshold, tray.ai triggers a PagerDuty incident with log query context, so application teams can investigate faster.
Steps:
- Trigger workflow from CloudWatch alarm backed by a log metric filter
- Run a CloudWatch Log Insights query to fetch recent matching log entries
- Create PagerDuty incident with alarm details and attach sample log output as incident notes
Connectors Used: AWS CloudWatch, PagerDuty
Template
Multi-Region Alarm Deduplication and PagerDuty Consolidation
Aggregates CloudWatch alarm events from multiple AWS regions, detects correlated alarms representing the same underlying issue, and creates a single consolidated PagerDuty incident rather than flooding on-call engineers with duplicate notifications. Subsequent correlated alarms are appended as notes on the existing incident.
Steps:
- Collect CloudWatch alarm events from multiple regions via a unified SNS fan-in topic
- Check for existing open PagerDuty incidents with matching deduplication key
- Create a new consolidated incident if none exists, or append alarm details to the existing incident as a note
Connectors Used: AWS CloudWatch, PagerDuty
Template
Post-Incident CloudWatch Metric Report Attachment
When a PagerDuty incident is resolved, automatically queries CloudWatch for metric statistics covering the incident window and attaches a formatted summary — including peak values, anomaly timestamps, and affected resource identifiers — directly to the PagerDuty incident as a post-mortem data artifact.
Steps:
- Trigger on PagerDuty incident resolved webhook event
- Parse incident timestamps and extract the affected CloudWatch resource and metric from incident details
- Query CloudWatch GetMetricStatistics for the incident window and post a formatted summary as a PagerDuty incident note
Connectors Used: AWS CloudWatch, PagerDuty
Template
Daily CloudWatch Anomaly Digest to PagerDuty Open Incidents
Runs on a daily schedule to query CloudWatch Anomaly Detector findings and AWS Health events, then creates low-urgency PagerDuty incidents for any new anomalies found. Teams get visibility into gradual degradation without relying solely on threshold-based alarms.
Steps:
- Schedule workflow to run daily and call CloudWatch DescribeAnomalyDetectors and list recent anomalies
- Filter for anomalies not previously reported and not yet within active alarm state
- Create low-urgency PagerDuty incidents for each new anomaly with trend context and recommended investigation steps
Connectors Used: AWS CloudWatch, PagerDuty