AWS Kinesis + AWS S3
Connect AWS Kinesis to AWS S3: Real-Time Streaming to Scalable Storage
Automate the flow of streaming data from AWS Kinesis directly into AWS S3 for analytics, archival, and downstream processing — no custom pipelines required.


Why integrate AWS Kinesis and AWS S3?
AWS Kinesis and AWS S3 are two of the most widely used services in a modern data infrastructure stack. Kinesis handles high-throughput real-time data streams — application events, IoT telemetry, you name it — while S3 gives you virtually unlimited, cost-effective object storage for raw, processed, and enriched data. Together they're the foundation of most data lake and analytics architectures on AWS, and connecting them is one of the most common integrations teams need to get right.
Automate & integrate AWS Kinesis & AWS S3
Use case
Stream Real-Time Event Data to S3 Data Lake
Capture application events, clickstream data, or user activity from Kinesis Data Streams and land them automatically in partitioned S3 prefixes organized by date, hour, or event type. The result is a continuously updated, query-ready data lake that analytics teams can hit immediately via Athena, Redshift Spectrum, or Spark.
Use case
Archive IoT Sensor Telemetry for Long-Term Storage
Ingest high-volume IoT device telemetry through Kinesis Data Streams or Kinesis Firehose and route it into dedicated S3 buckets with configurable compression and file formatting. This supports long-term retention of sensor readings for predictive maintenance modeling, regulatory compliance, and historical trend analysis.
Use case
Trigger ETL Workflows When New Data Lands in S3
After Kinesis delivers data to S3, automatically trigger downstream ETL or data transformation workflows using S3 event notifications. Streaming data flows from ingestion through transformation to a clean, analytics-ready layer without any manual intervention.
Use case
Centralize Multi-Source Log Data for Security and Compliance
Route application logs, VPC flow logs, and CloudTrail events from multiple Kinesis streams into a centralized S3 bucket structure organized by log type and source. Security and compliance teams get a single, tamper-evident repository for incident investigation, SIEM ingestion, and regulatory auditing.
Use case
Build Machine Learning Training Datasets from Streaming Data
Collect raw inference requests, user interaction signals, or model feedback events via Kinesis and accumulate them in S3 in ML-ready formats such as JSON Lines or CSV. Data science teams can then use S3 as the source for periodic model retraining jobs in SageMaker or other ML platforms.
Use case
Monitor and Alert on Kinesis Stream Health via S3 Snapshots
Periodically snapshot Kinesis stream metrics and shard-level consumer lag data to S3 as structured JSON files. Operations teams can use these snapshots alongside CloudWatch data to build historical visibility into stream throughput, backpressure events, and consumer performance trends.
Use case
Fan Out Kinesis Events to Multiple S3 Destinations by Event Type
Use tray.ai to consume events from a single Kinesis stream and route records to different S3 bucket prefixes or separate buckets based on event type, tenant ID, or data classification. Multiple teams get access to exactly the data they need without exposing cross-team records in a single flat bucket.
Get started with AWS Kinesis & AWS S3 integration today
AWS Kinesis & AWS S3 Challenges
What challenges are there when working with AWS Kinesis & AWS S3 and how will using Tray.ai help?
Challenge
Handling High-Throughput Kinesis Streams Without Data Loss
Kinesis streams can produce tens of thousands of records per second across multiple shards, making it easy for a naive consumer to fall behind, miss records, or exhaust the read throughput limit per shard. Any gap in consumption means data that never reaches S3 and can't be recovered after the Kinesis retention window expires.
How Tray.ai Can Help:
Tray.ai's workflow engine handles parallel shard consumption natively, with configurable batch sizes and retry logic that respects Kinesis's per-shard read limits. Sequence number checkpointing means a workflow interruption won't cause duplicate or missing records when processing resumes, so teams can trust that every record lands in S3.
Challenge
Managing Schema Evolution Across Kinesis and S3
As upstream producers add, remove, or rename fields in Kinesis record payloads over time, downstream S3 files can become inconsistent, mixing old and new schemas across partitions. This breaks Athena queries, Glue crawlers, and Spark jobs that expect a uniform schema across all files in a prefix.
How Tray.ai Can Help:
Tray.ai lets teams define schema transformation logic within their integration workflows, normalizing incoming Kinesis records to a target schema before writing to S3. Field mappings, default value injection, and conditional transformations are all configurable without code, so schema changes become a managed process rather than a recurring source of pipeline breakage.
Challenge
Ensuring At-Least-Once Delivery Without Duplicates
Distributed streaming systems like Kinesis provide at-least-once delivery semantics, meaning duplicate records can appear during shard rebalancing, consumer restarts, or retry events. Without deduplication logic, S3 files can end up with duplicate rows that corrupt aggregate metrics and analytics results downstream.
How Tray.ai Can Help:
Tray.ai workflows support idempotent S3 write patterns by using deterministic object key generation based on Kinesis sequence numbers and shard IDs. Even if a record is processed twice, it produces the same S3 object key and doesn't create duplicate files — effectively idempotent delivery without a separate deduplication store.
Challenge
Organizing S3 Data for Cost-Efficient Querying at Scale
Without a deliberate partitioning and file-size strategy, Kinesis consumers can create millions of tiny S3 objects — one per record or per second — which dramatically increases S3 LIST operation costs, slows down Athena query planning, and inflates Glue catalog maintenance overhead. It's a common problem for teams that move fast without thinking about storage layout.
How Tray.ai Can Help:
Tray.ai's buffering and batching configuration lets teams set time-window or record-count thresholds before writing to S3, producing right-sized files that balance query performance with ingestion latency. Dynamic key templating in the workflow ensures files land in properly structured Hive-compatible partition paths that Athena and Glue can use without any additional catalog work.
Challenge
Securing Cross-Service Data Access Between Kinesis and S3
Integrating Kinesis with S3 requires carefully scoped IAM roles, bucket policies, and KMS encryption key permissions that are easy to misconfigure, especially in multi-account or cross-region architectures. Over-permissive policies create security risks, while overly restrictive ones silently break pipelines in ways that are hard to diagnose.
How Tray.ai Can Help:
Tray.ai centralizes credential and authentication management for both Kinesis and S3 connections, letting teams configure least-privilege IAM role ARNs for each workflow without embedding credentials in code. Built-in connection testing validates permissions before workflows go live, and tray.ai's audit logs give a clear record of which workflows accessed which Kinesis streams and S3 buckets — useful for security reviews and incident response.
Start using our pre-built AWS Kinesis & AWS S3 templates today
Start from scratch or use one of our pre-built AWS Kinesis & AWS S3 templates to quickly solve your most common use cases.
AWS Kinesis & AWS S3 Templates
Find pre-built AWS Kinesis & AWS S3 solutions for common use cases
Template
Kinesis Stream to S3 Partitioned Data Lake Loader
This template continuously reads records from a specified Kinesis Data Stream and writes them to an S3 bucket using a dynamic key structure partitioned by year, month, day, and hour. It handles batching, serialization to JSON or Parquet, and error retries to ensure no records are lost in transit.
Steps:
- Poll the configured Kinesis Data Stream for new records using a shard iterator with configurable batch size
- Transform and serialize each batch of records into the target format (JSON Lines, CSV, or Parquet) with timestamp enrichment
- Write the serialized batch to the S3 bucket using a dynamic key path partitioned by event date and hour
- Checkpoint the Kinesis sequence number after a successful S3 write to prevent duplicate processing
Connectors Used: AWS Kinesis, AWS S3
Template
Kinesis Firehose Delivery Failure Reprocessing to S3
Monitors a Kinesis Firehose error output bucket in S3 for failed delivery records, parses the failure reason, and re-queues valid records back into the original Kinesis stream or an alternative S3 destination for recovery. Delivery failures don't have to mean permanent data loss.
Steps:
- Poll the designated S3 error prefix for newly written Firehose failure objects on a scheduled interval
- Parse each failure object to extract the original record payload and classify the failure type
- Re-submit recoverable records back to the Kinesis stream or write them directly to the target S3 path
- Move processed error objects to an S3 archive prefix and trigger an alert for unrecoverable failures
Connectors Used: AWS Kinesis, AWS S3
Template
Multi-Tenant Event Router: Kinesis to Isolated S3 Buckets
Reads events from a shared Kinesis stream, inspects each record's tenant or account identifier field, and routes the record to the corresponding tenant-specific S3 prefix or bucket. A SaaS platform can maintain strict data isolation per customer without needing to operate separate Kinesis streams.
Steps:
- Consume a batch of records from the shared Kinesis stream using the configured consumer group
- Extract the tenant identifier from each record's payload using a configurable JSON path or field mapping
- Look up the target S3 bucket and prefix for the resolved tenant from a routing configuration store
- Write each record to its tenant-specific S3 path and log the routing outcome for observability
Connectors Used: AWS Kinesis, AWS S3
Template
S3 New Object Trigger to Kinesis Stream Ingestion
Listens for S3 PutObject events in a specified bucket and prefix, reads the newly uploaded file, and publishes each record or row from the file as an individual event onto a Kinesis Data Stream. It's a practical way to bridge batch file uploads with real-time streaming consumers.
Steps:
- Receive an S3 event notification when a new object is created in the monitored bucket and prefix
- Download and parse the newly uploaded file from S3 (CSV, JSON, or Avro format supported)
- Iterate through each row or record and publish it as an individual Kinesis PutRecord or PutRecords call
- Tag the source S3 object with a processing-complete metadata flag after all records are published
Connectors Used: AWS S3, AWS Kinesis
Template
IoT Telemetry Kinesis Stream to Compressed S3 Archive
Aggregates raw IoT device payloads from a Kinesis Data Stream over a configurable time window, compresses the batch using GZIP or Snappy, and writes the compressed file to a time-partitioned S3 prefix. Built for high-throughput IoT environments where storage cost and query efficiency actually matter.
Steps:
- Buffer incoming Kinesis records over a configurable tumbling window (e.g., 5 minutes) to form a batch
- Apply GZIP or Snappy compression to the record batch and convert to Parquet or ORC format if configured
- Write the compressed file to S3 using a device-type and date-partitioned key structure
- Emit a metadata record to a separate S3 manifest prefix documenting the file size, record count, and time range
Connectors Used: AWS Kinesis, AWS S3
Template
Kinesis Consumer Lag Monitor with S3 Snapshot and Alert
Periodically reads Kinesis shard-level GetRecords metrics and consumer sequence position data, writes a structured snapshot JSON file to S3 for historical tracking, and triggers a notification if consumer lag exceeds a configurable threshold. It gives you operational visibility that CloudWatch metrics alone can't provide.
Steps:
- On a scheduled interval, describe all shards in the target Kinesis stream and retrieve current iterator positions
- Calculate per-shard consumer lag by comparing the latest sequence number to the consumer checkpoint
- Write the lag snapshot as a timestamped JSON object to the designated S3 monitoring bucket
- Evaluate lag values against configured thresholds and trigger an alert workflow if any shard exceeds the limit
Connectors Used: AWS Kinesis, AWS S3