AWS Kinesis + AWS S3

Connect AWS Kinesis to AWS S3: Real-Time Streaming to Scalable Storage

Automate the flow of streaming data from AWS Kinesis directly into AWS S3 for analytics, archival, and downstream processing — no custom pipelines required.

Talk to sales See how tray works

Why integrate AWS Kinesis and AWS S3?

AWS Kinesis and AWS S3 are two of the most widely used services in a modern data infrastructure stack. Kinesis handles high-throughput real-time data streams — application events, IoT telemetry, you name it — while S3 gives you virtually unlimited, cost-effective object storage for raw, processed, and enriched data. Together they're the foundation of most data lake and analytics architectures on AWS, and connecting them is one of the most common integrations teams need to get right.

View AWS Kinesis documentation View AWS S3 documentation

Automate & integrate AWS Kinesis & AWS S3

Learn about automation Discover integration

Use case

Stream Real-Time Event Data to S3 Data Lake

Capture application events, clickstream data, or user activity from Kinesis Data Streams and land them automatically in partitioned S3 prefixes organized by date, hour, or event type. The result is a continuously updated, query-ready data lake that analytics teams can hit immediately via Athena, Redshift Spectrum, or Spark.

Use case

Archive IoT Sensor Telemetry for Long-Term Storage

Ingest high-volume IoT device telemetry through Kinesis Data Streams or Kinesis Firehose and route it into dedicated S3 buckets with configurable compression and file formatting. This supports long-term retention of sensor readings for predictive maintenance modeling, regulatory compliance, and historical trend analysis.

Use case

Trigger ETL Workflows When New Data Lands in S3

After Kinesis delivers data to S3, automatically trigger downstream ETL or data transformation workflows using S3 event notifications. Streaming data flows from ingestion through transformation to a clean, analytics-ready layer without any manual intervention.

Use case

Centralize Multi-Source Log Data for Security and Compliance

Route application logs, VPC flow logs, and CloudTrail events from multiple Kinesis streams into a centralized S3 bucket structure organized by log type and source. Security and compliance teams get a single, tamper-evident repository for incident investigation, SIEM ingestion, and regulatory auditing.

Use case

Build Machine Learning Training Datasets from Streaming Data

Collect raw inference requests, user interaction signals, or model feedback events via Kinesis and accumulate them in S3 in ML-ready formats such as JSON Lines or CSV. Data science teams can then use S3 as the source for periodic model retraining jobs in SageMaker or other ML platforms.

Use case

Monitor and Alert on Kinesis Stream Health via S3 Snapshots

Periodically snapshot Kinesis stream metrics and shard-level consumer lag data to S3 as structured JSON files. Operations teams can use these snapshots alongside CloudWatch data to build historical visibility into stream throughput, backpressure events, and consumer performance trends.

Use case

Fan Out Kinesis Events to Multiple S3 Destinations by Event Type

Use tray.ai to consume events from a single Kinesis stream and route records to different S3 bucket prefixes or separate buckets based on event type, tenant ID, or data classification. Multiple teams get access to exactly the data they need without exposing cross-team records in a single flat bucket.

Get started with AWS Kinesis & AWS S3 integration today

Talk to sales See how tray works

AWS Kinesis & AWS S3 Challenges

What challenges are there when working with AWS Kinesis & AWS S3 and how will using Tray.ai help?

Challenge

Handling High-Throughput Kinesis Streams Without Data Loss

Kinesis streams can produce tens of thousands of records per second across multiple shards, making it easy for a naive consumer to fall behind, miss records, or exhaust the read throughput limit per shard. Any gap in consumption means data that never reaches S3 and can't be recovered after the Kinesis retention window expires.

How Tray.ai Can Help:

Tray.ai's workflow engine handles parallel shard consumption natively, with configurable batch sizes and retry logic that respects Kinesis's per-shard read limits. Sequence number checkpointing means a workflow interruption won't cause duplicate or missing records when processing resumes, so teams can trust that every record lands in S3.

Challenge

Managing Schema Evolution Across Kinesis and S3

As upstream producers add, remove, or rename fields in Kinesis record payloads over time, downstream S3 files can become inconsistent, mixing old and new schemas across partitions. This breaks Athena queries, Glue crawlers, and Spark jobs that expect a uniform schema across all files in a prefix.

How Tray.ai Can Help:

Tray.ai lets teams define schema transformation logic within their integration workflows, normalizing incoming Kinesis records to a target schema before writing to S3. Field mappings, default value injection, and conditional transformations are all configurable without code, so schema changes become a managed process rather than a recurring source of pipeline breakage.

Challenge

Ensuring At-Least-Once Delivery Without Duplicates

Distributed streaming systems like Kinesis provide at-least-once delivery semantics, meaning duplicate records can appear during shard rebalancing, consumer restarts, or retry events. Without deduplication logic, S3 files can end up with duplicate rows that corrupt aggregate metrics and analytics results downstream.

How Tray.ai Can Help:

Tray.ai workflows support idempotent S3 write patterns by using deterministic object key generation based on Kinesis sequence numbers and shard IDs. Even if a record is processed twice, it produces the same S3 object key and doesn't create duplicate files — effectively idempotent delivery without a separate deduplication store.

Challenge

Organizing S3 Data for Cost-Efficient Querying at Scale

Without a deliberate partitioning and file-size strategy, Kinesis consumers can create millions of tiny S3 objects — one per record or per second — which dramatically increases S3 LIST operation costs, slows down Athena query planning, and inflates Glue catalog maintenance overhead. It's a common problem for teams that move fast without thinking about storage layout.

How Tray.ai Can Help:

Tray.ai's buffering and batching configuration lets teams set time-window or record-count thresholds before writing to S3, producing right-sized files that balance query performance with ingestion latency. Dynamic key templating in the workflow ensures files land in properly structured Hive-compatible partition paths that Athena and Glue can use without any additional catalog work.

Challenge

Securing Cross-Service Data Access Between Kinesis and S3

Integrating Kinesis with S3 requires carefully scoped IAM roles, bucket policies, and KMS encryption key permissions that are easy to misconfigure, especially in multi-account or cross-region architectures. Over-permissive policies create security risks, while overly restrictive ones silently break pipelines in ways that are hard to diagnose.

How Tray.ai Can Help:

Tray.ai centralizes credential and authentication management for both Kinesis and S3 connections, letting teams configure least-privilege IAM role ARNs for each workflow without embedding credentials in code. Built-in connection testing validates permissions before workflows go live, and tray.ai's audit logs give a clear record of which workflows accessed which Kinesis streams and S3 buckets — useful for security reviews and incident response.

Start using our pre-built AWS Kinesis & AWS S3 templates today

Start from scratch or use one of our pre-built AWS Kinesis & AWS S3 templates to quickly solve your most common use cases.

Talk to sales See how tray works

AWS Kinesis & AWS S3 Templates

Find pre-built AWS Kinesis & AWS S3 solutions for common use cases

Browse all templates

Template

Kinesis Stream to S3 Partitioned Data Lake Loader

This template continuously reads records from a specified Kinesis Data Stream and writes them to an S3 bucket using a dynamic key structure partitioned by year, month, day, and hour. It handles batching, serialization to JSON or Parquet, and error retries to ensure no records are lost in transit.

Steps:

Poll the configured Kinesis Data Stream for new records using a shard iterator with configurable batch size
Transform and serialize each batch of records into the target format (JSON Lines, CSV, or Parquet) with timestamp enrichment
Write the serialized batch to the S3 bucket using a dynamic key path partitioned by event date and hour
Checkpoint the Kinesis sequence number after a successful S3 write to prevent duplicate processing

Connectors Used: AWS Kinesis, AWS S3

Talk to sales

Template

Kinesis Firehose Delivery Failure Reprocessing to S3

Monitors a Kinesis Firehose error output bucket in S3 for failed delivery records, parses the failure reason, and re-queues valid records back into the original Kinesis stream or an alternative S3 destination for recovery. Delivery failures don't have to mean permanent data loss.

Steps:

Poll the designated S3 error prefix for newly written Firehose failure objects on a scheduled interval
Parse each failure object to extract the original record payload and classify the failure type
Re-submit recoverable records back to the Kinesis stream or write them directly to the target S3 path
Move processed error objects to an S3 archive prefix and trigger an alert for unrecoverable failures

Connectors Used: AWS Kinesis, AWS S3

Talk to sales

Template

Multi-Tenant Event Router: Kinesis to Isolated S3 Buckets

Reads events from a shared Kinesis stream, inspects each record's tenant or account identifier field, and routes the record to the corresponding tenant-specific S3 prefix or bucket. A SaaS platform can maintain strict data isolation per customer without needing to operate separate Kinesis streams.

Steps:

Consume a batch of records from the shared Kinesis stream using the configured consumer group
Extract the tenant identifier from each record's payload using a configurable JSON path or field mapping
Look up the target S3 bucket and prefix for the resolved tenant from a routing configuration store
Write each record to its tenant-specific S3 path and log the routing outcome for observability

Connectors Used: AWS Kinesis, AWS S3

Talk to sales

Template

S3 New Object Trigger to Kinesis Stream Ingestion

Listens for S3 PutObject events in a specified bucket and prefix, reads the newly uploaded file, and publishes each record or row from the file as an individual event onto a Kinesis Data Stream. It's a practical way to bridge batch file uploads with real-time streaming consumers.

Steps:

Receive an S3 event notification when a new object is created in the monitored bucket and prefix
Download and parse the newly uploaded file from S3 (CSV, JSON, or Avro format supported)
Iterate through each row or record and publish it as an individual Kinesis PutRecord or PutRecords call
Tag the source S3 object with a processing-complete metadata flag after all records are published

Connectors Used: AWS S3, AWS Kinesis

Talk to sales

Template

IoT Telemetry Kinesis Stream to Compressed S3 Archive

Aggregates raw IoT device payloads from a Kinesis Data Stream over a configurable time window, compresses the batch using GZIP or Snappy, and writes the compressed file to a time-partitioned S3 prefix. Built for high-throughput IoT environments where storage cost and query efficiency actually matter.

Steps:

Buffer incoming Kinesis records over a configurable tumbling window (e.g., 5 minutes) to form a batch
Apply GZIP or Snappy compression to the record batch and convert to Parquet or ORC format if configured
Write the compressed file to S3 using a device-type and date-partitioned key structure
Emit a metadata record to a separate S3 manifest prefix documenting the file size, record count, and time range

Connectors Used: AWS Kinesis, AWS S3

Talk to sales

Template

Kinesis Consumer Lag Monitor with S3 Snapshot and Alert

Periodically reads Kinesis shard-level GetRecords metrics and consumer sequence position data, writes a structured snapshot JSON file to S3 for historical tracking, and triggers a notification if consumer lag exceeds a configurable threshold. It gives you operational visibility that CloudWatch metrics alone can't provide.

Steps:

On a scheduled interval, describe all shards in the target Kinesis stream and retrieve current iterator positions
Calculate per-shard consumer lag by comparing the latest sequence number to the consumer checkpoint
Write the lag snapshot as a timestamped JSON object to the designated S3 monitoring bucket
Evaluate lag values against configured thresholds and trigger an alert workflow if any shard exceeds the limit

Connectors Used: AWS Kinesis, AWS S3

Talk to sales