In the world of distributed systems and microservices, “decoupling” is a word you’ll hear constantly. As applications evolve from monoliths to microservices, the way services communicate with each other becomes critical to the system’s stability and scalability.
Today, I want to take a deep dive into four core AWS services in the Messaging & Integration space: SQS, SNS, Kinesis, and Amazon MQ. We’ll explore how they work and, more importantly, when to use each one.
1. Why Decouple?
When you have multiple applications that need to talk to each other, communication is inevitable. There are generally two patterns:
-
Synchronous Communication: Application A directly calls Application B.
- The Risk: This is like making a phone call. If the other party doesn’t pick up, or if the line is overloaded (a traffic spike), the call fails. Imagine a video encoding service that normally handles 10 requests per second. If that jumps to 1,000, the downstream service might just crash.
-
Asynchronous / Event-Based Communication: Application A → Middleware → Application B.
- The Advantage: This is decoupling in action. By introducing a buffer layer like SQS (a queue), SNS (pub/sub), or Kinesis (streaming), you allow the downstream service to process requests at its own pace, even during traffic spikes. Each service can scale independently.
2. Amazon SQS: The Foundation of Message Queuing
Amazon SQS (Simple Queue Service) is one of the oldest AWS services (over 10 years old!). Its core purpose is to act as a buffer.
Standard Queue
This is the most common type, offering unlimited throughput.
- Low Latency: Sending and receiving typically takes < 10 milliseconds.
- Message Retention: Default is 4 days, up to a maximum of 14 days.
- Size Limit: Each message can be up to 256KB.
- Trade-offs: To achieve high throughput, it guarantees “at-least-once” delivery. This means you may receive duplicate messages, and messages may arrive out of order (best-effort ordering).
How It Works
- Producer: Sends data to the queue using the SDK (
SendMessage). - Consumer: Code running on EC2, Lambda, or elsewhere.
- Polling: Consumers actively pull messages from the queue (up to 10 at a time).
- Process & Delete: After the business logic is complete (e.g., writing to RDS), the consumer must call the
DeleteMessageAPI. If not, the message will return to the queue and be processed again. - Horizontal Scaling: Too many messages to handle? Just add more consumer instances (using Auto Scaling Groups).
Configuration Tips
-
Visibility Timeout:
- Default is 30 seconds. This prevents multiple consumers from processing the same message simultaneously. If a consumer doesn’t finish processing and delete the message within this window, it “reappears” for another consumer to pick up. If your tasks take a long time, increase this value.
-
Long Polling: Highly recommended.
- With short polling, if the queue is empty, the call returns immediately with nothing, wasting API calls and money. Long polling waits (1-20 seconds) for a message to arrive before returning. This saves costs and reduces latency.
SQS FIFO (First-In-First-Out)
If you’re dealing with banking transactions or e-commerce orders, order matters.
- Ordering: Strict First-In-First-Out.
- Exactly-Once Processing: No duplicates.
- Trade-off: Limited throughput (300 TPS by default, up to 3,000 with batching).
- Deduplication: Supports content-based deduplication or an explicit deduplication ID.
3. Amazon SNS: One-to-Many Broadcasting
If SQS is like sending a direct email, Amazon SNS (Simple Notification Service) is like making a broadcast announcement over a loudspeaker.
Core Model: Pub/Sub
- Producers send messages to a Topic.
- All subscribers to that Topic (Email, SMS, Lambda, SQS, HTTPs) receive the message.
- Scale: A single Topic supports up to 12.5 million subscribers.
Classic Pattern: SNS + SQS Fan-Out
This is a common design pattern you’ll see on AWS architecture exams.
- Scenario: When an order is placed, you need to send an email confirmation to the user, notify the warehouse to ship, and alert the fraud detection system.
- Solution: The order service publishes a message to an SNS Topic → That Topic is subscribed to by multiple SQS queues.
- Benefits:
- Full Decoupling: To add a new feature (like a data analytics service), you just add a new SQS subscription. No changes to the upstream code.
- Durability: SQS guarantees the message won’t be lost. Even if a downstream service is down, the message waits in the queue.
- Message Filtering: You can set JSON policies on SNS. For example, the “refunds queue” only receives messages with
"type": "refund"and ignores “order placed” messages.
4. Amazon Kinesis: Big Data Stream Processing
When your data isn’t a series of independent “messages” but rather a continuous “stream” (like clickstreams, application logs, or IoT sensor data), you need Kinesis.
Kinesis Data Streams
- How It Works: Similar to Apache Kafka. It’s composed of Shards, and you need to manually provision the number of shards.
- Key Features:
- Real-Time: Very high (around 200ms latency).
- Replay: Data is retained for 24 hours by default (up to 1 year). This is the biggest difference from SQS—SQS deletes a message once consumed, but Kinesis keeps it. You can replay and re-process data.
- Immutability: Once data is in the stream, it cannot be modified or deleted.
Kinesis Data Firehose
- Purpose: Think of it as a “delivery truck.” Its job is to load stream data into a destination (S3, Redshift, OpenSearch, Datadog, etc.).
- Features: Fully managed. No code required.
- Latency: Near real-time (around 60 seconds), because it buffers data before writing.
In a nutshell: Want to write your own code for real-time processing? Use Data Streams. Just want to archive logs to S3 with no fuss? Use Firehose.
5. Amazon MQ: The Bridge for Legacy Applications
- Scenario: Your company has a legacy system that’s been running for 10 years, using protocols like MQTT, AMQP, or OpenWire (maybe running on RabbitMQ).
- Pain Point: When migrating to the cloud, you don’t want to rewrite your code to adapt to SQS/SNS’s proprietary APIs.
- Solution: Use Amazon MQ. It’s a managed Apache ActiveMQ or RabbitMQ.
- Caveat: It’s not serverless like SQS (which scales infinitely). It runs on provisioned server instances, so scalability is limited by server performance. However, it supports Active/Standby deployments for high availability.
6. Cheat Sheet: When to Use What?
If you’re still unsure which service to pick, here’s a quick reference table:
| Feature | SQS | SNS | Kinesis |
|---|---|---|---|
| Model | Queue | Pub/Sub | Real-time Stream |
| Data Flow | Pull (Consumer polls) | Push (Pushed to subscribers) | Pull (Standard) / Push (Enhanced Fan-Out) |
| Persistence | Deleted after consumption | No persistence (if delivery fails, it’s gone) | Replayable (24h default retention) |
| Use Cases | Task buffering, load leveling | Message broadcasting, email/SMS notifications | Log aggregation, real-time big data analytics |
Quick Decision Guide:
- Need to decouple and buffer? Start with SQS.
- Need to broadcast the same data to multiple systems? Use SNS + SQS (Fan-Out).
- Need to process massive real-time log streams? Use Kinesis.
- Migrating a legacy system and don’t want to rewrite code? Use Amazon MQ.
Choose wisely, and build systems that scale!