Splunk HTTP Event Collector + PagerDuty

Connect Splunk HTTP Event Collector with PagerDuty to Automate Incident Response

Stream Splunk events directly into PagerDuty to trigger alerts, manage on-call rotations, and resolve incidents faster.

Why integrate Splunk HTTP Event Collector and PagerDuty?

Splunk HTTP Event Collector (HEC) is a high-throughput data ingestion layer that captures logs, metrics, and machine-generated events from across your infrastructure. PagerDuty is the incident management platform that routes alerts to the right people at the right time. Together, they form a closed-loop system where anomalies detected in Splunk automatically become actionable incidents in PagerDuty. Connecting the two cuts the lag between detecting a problem and getting your team moving on it.

Automate & integrate Splunk HTTP Event Collector & PagerDuty

Use case

Automated Incident Creation from Splunk Alerts

When Splunk detects a critical threshold breach — a spike in error rates, CPU usage, or failed login attempts — tray.ai forwards the event payload from Splunk HEC directly to PagerDuty to open a new incident. The incident gets automatically enriched with Splunk search results, severity metadata, and relevant log snippets so on-call engineers have full context from the moment they're paged. No manual triage step, no alerts falling through the cracks.

Use case

Alert Deduplication and Noise Reduction

High-volume Splunk environments can generate hundreds of correlated events for a single underlying issue, flooding PagerDuty with duplicate alerts and burning out on-call engineers fast. tray.ai workflows apply deduplication logic — grouping events by common fields like host, service, or error code — before routing only unique, actionable incidents to PagerDuty. Your on-call queue stays clean and engineers stay focused on real problems.

Use case

Automatic Incident Resolution on Recovery Events

When Splunk detects that a previously alarming condition has returned to normal — error rates dropping below threshold, a service recovering — tray.ai automatically sends a resolve signal to the corresponding PagerDuty incident. This closes the feedback loop between your observability layer and your incident management platform without manual intervention. Teams get accurate MTTR metrics and engineers aren't left managing stale open incidents.

Use case

Security Event Escalation and Threat Response

Security operations teams using Splunk for SIEM can route high-fidelity threat detections — brute force attempts, lateral movement indicators, data exfiltration patterns — directly to PagerDuty as high-urgency incidents. tray.ai enriches the PagerDuty incident with MITRE ATT&CK classifications, affected asset details, and raw log evidence from Splunk, creating an immediate, auditable response chain from detection to acknowledgment and remediation.

Use case

Infrastructure Capacity and Performance Alerting

Operations teams can configure tray.ai to listen for Splunk HEC events tied to infrastructure metrics — disk utilization, memory pressure, network saturation, pod crash loops — and translate them into appropriately prioritized PagerDuty incidents. Severity levels in PagerDuty get set automatically based on Splunk event severity fields, so P1 incidents get immediate pages while P3 issues go into a low-urgency queue. It's a consistent, automated approach to capacity incident management.

Use case

Post-Incident Enrichment and Retrospective Data Logging

Once a PagerDuty incident is resolved, tray.ai can send a structured summary event back to Splunk HEC — including time to acknowledge, time to resolve, responder names, and incident notes — building an operational dataset for retrospectives and SLA reporting. With this bidirectional flow, Splunk becomes the single source of truth for both detection events and incident lifecycle data. Teams can build Splunk dashboards that show incident trends, response performance, and recurring failure patterns.

Use case

On-Call Schedule-Aware Alert Routing

tray.ai can query PagerDuty's on-call schedule API when a Splunk event fires and dynamically route alerts to the correct service or escalation policy based on team ownership tags embedded in the Splunk event payload. This prevents critical alerts from landing in the wrong team's queue when services share a common Splunk index. Every alert gets owned and actioned by the team best equipped to resolve it.

Get started with Splunk HTTP Event Collector & PagerDuty integration today

Splunk HTTP Event Collector & PagerDuty Challenges

What challenges are there when working with Splunk HTTP Event Collector & PagerDuty and how will using Tray.ai help?

Challenge

Reliable Event Delivery at High Ingestion Volumes

Splunk HEC environments often handle thousands of events per second. Making sure every critical event reliably triggers the correct PagerDuty action — without dropped messages or duplicate incidents — is a real engineering problem when building custom integrations.

How Tray.ai Can Help:

tray.ai's workflow engine has built-in retry logic, error handling branches, and idempotent event processing using PagerDuty's deduplication key system. Workflows can queue and retry failed PagerDuty API calls, so no critical alert gets silently lost even during high-volume bursts.

Challenge

Mapping Heterogeneous Splunk Event Schemas to PagerDuty's Payload Format

Splunk indexes aggregate events from dozens of different source types — firewalls, application servers, cloud platforms, containers — each with its own field naming conventions and severity scales. Consistently mapping that data to PagerDuty's standardized incident fields is harder than it sounds.

How Tray.ai Can Help:

tray.ai's visual data mapper and JavaScript transform steps let teams define flexible, source-specific field mapping logic within a single workflow. Conditional branches handle different source types, normalizing severity, title, and body fields into a consistent PagerDuty payload regardless of where the Splunk event originated.

Challenge

Avoiding Alert Fatigue from Correlated or Flapping Events

When an underlying infrastructure issue causes dozens of dependent services to log errors simultaneously, a naive Splunk-to-PagerDuty integration creates an avalanche of separate incidents that overwhelms on-call engineers and buries the root cause rather than surfacing it.

How Tray.ai Can Help:

tray.ai workflows support time-window buffering, event aggregation, and composite deduplication key logic that groups correlated Splunk events before any PagerDuty incident gets created. Engineers receive a single, contextualized incident describing the blast radius rather than hundreds of isolated alerts.

Challenge

Closing the Incident Feedback Loop

Most point-to-point Splunk-PagerDuty integrations only send data one way — from Splunk to PagerDuty — leaving incident resolution data siloed in PagerDuty and unavailable for Splunk-based operational analytics, capacity planning dashboards, or SLA reporting.

How Tray.ai Can Help:

tray.ai supports bidirectional workflows, so teams can push events from Splunk HEC to PagerDuty and pull resolved incident data back from PagerDuty webhooks into Splunk HEC. You get a complete operational data loop without custom middleware or additional infrastructure.

Challenge

Keeping Service and Escalation Policy Routing Accurate as Teams Evolve

As organizations grow, PagerDuty service ownership changes, new escalation policies get added, and Splunk alert rules can fall out of sync with the current team structure — causing critical alerts to route to the wrong team or land in unowned queues.

How Tray.ai Can Help:

tray.ai workflows can dynamically query PagerDuty's Services API at runtime to resolve the correct service ID based on metadata in the Splunk event — like team name or application tag — rather than hardcoding service IDs in alert configurations. Routing stays accurate as org structures change, with no manual workflow updates needed.

Start using our pre-built Splunk HTTP Event Collector & PagerDuty templates today

Start from scratch or use one of our pre-built Splunk HTTP Event Collector & PagerDuty templates to quickly solve your most common use cases.

Splunk HTTP Event Collector & PagerDuty Templates

Find pre-built Splunk HTTP Event Collector & PagerDuty solutions for common use cases

Browse all templates

Template

Splunk HEC Critical Alert to PagerDuty Incident

Listens for incoming Splunk HEC events tagged with a critical or high severity field and automatically creates a new PagerDuty incident with enriched context, assigning it to the appropriate service based on the source field in the Splunk payload.

Steps:

  • Receive inbound event from Splunk HTTP Event Collector via tray.ai webhook trigger
  • Parse severity, source, host, and message fields from the Splunk event JSON payload
  • Filter to only process events where severity equals 'critical' or 'high'
  • Map Splunk fields to PagerDuty incident payload including title, body, and dedup key
  • Create a new incident in PagerDuty via the Events API with appropriate service routing

Connectors Used: Splunk HTTP Event Collector, PagerDuty

Template

Auto-Resolve PagerDuty Incident on Splunk Recovery Event

Monitors Splunk HEC for recovery or clear events that match a previously fired alert and automatically sends a resolve action to PagerDuty using the original deduplication key, closing the incident without manual intervention.

Steps:

  • Receive Splunk HEC event with event type field equal to 'recovery' or 'clear'
  • Extract the deduplication key or correlation ID from the Splunk event payload
  • Look up the matching open incident in PagerDuty using the dedup key
  • Send a resolve event to PagerDuty Events API to automatically close the incident
  • Log the resolution timestamp back to Splunk HEC for MTTR tracking

Connectors Used: Splunk HTTP Event Collector, PagerDuty

Template

Splunk Security Alert to High-Urgency PagerDuty Incident

Built for security operations teams, this template routes Splunk SIEM detections — including threat classification, affected assets, and raw log evidence — to a dedicated PagerDuty security service as a high-urgency incident with a full context note.

Steps:

  • Trigger on Splunk HEC events where index equals 'security' and severity is 'critical'
  • Extract threat category, affected host, source IP, and raw log fields from payload
  • Construct a formatted PagerDuty incident body including all threat indicators
  • Create a high-urgency PagerDuty incident on the security escalation policy
  • Add a PagerDuty incident note with a direct link to the Splunk search for full log context

Connectors Used: Splunk HTTP Event Collector, PagerDuty

Template

Deduplicated Splunk Event Batching to PagerDuty

Collects a rolling window of Splunk HEC events, applies deduplication logic based on host and error code, and fires only unique incidents to PagerDuty — preventing alert storms from flooding on-call queues during correlated failures.

Steps:

  • Buffer incoming Splunk HEC events over a configurable time window (e.g., 60 seconds)
  • Group events by composite key of host name and error classification field
  • Suppress duplicate events within the window, retaining only the first occurrence per key
  • Create one PagerDuty incident per unique event group with aggregated event count in the body
  • Set the PagerDuty dedup key to the composite key to prevent re-creation on subsequent batches

Connectors Used: Splunk HTTP Event Collector, PagerDuty

Template

PagerDuty Incident Resolved — Log Lifecycle Data to Splunk HEC

Triggers when a PagerDuty incident transitions to resolved status and sends a structured incident lifecycle event — including MTTA, MTTR, responder, and resolution notes — back to Splunk HEC for operational analytics and SLA dashboards.

Steps:

  • Trigger on PagerDuty webhook event for incident status change to 'resolved'
  • Fetch full incident detail from PagerDuty API including timeline and responder information
  • Calculate MTTA and MTTR from incident created_at and acknowledged_at timestamps
  • Format a structured JSON event containing all lifecycle fields and resolution notes
  • Send the event to Splunk HEC index designated for incident analytics

Connectors Used: PagerDuty, Splunk HTTP Event Collector

Template

Splunk Infrastructure Metric Breach to Tiered PagerDuty Alert

Routes Splunk HEC infrastructure metric events to PagerDuty with automatic urgency tiering — critical thresholds trigger high-urgency pages while warning thresholds create low-urgency incidents — so responders are engaged at the right level for every alert.

Steps:

  • Receive Splunk HEC event containing metric name, current value, threshold, and severity
  • Apply conditional logic to classify severity as 'critical', 'warning', or 'informational'
  • Set PagerDuty incident urgency to 'high' for critical and 'low' for warning classifications
  • Create a PagerDuty incident with metric value, threshold breach details, and affected host
  • Skip PagerDuty creation for informational events and route them to a Splunk audit index only

Connectors Used: Splunk HTTP Event Collector, PagerDuty