SAGA Pattern
A SAGA is a pattern for handling long-running business processes in distributed systems without using a single distributed database transaction.
In a microservices architecture, each service owns its own data store. That means you cannot safely do:
“Update Service A DB, update Service B DB, update Service C DB — all in one ACID transaction.”
Instead, a SAGA coordinates a sequence of local transactions (each service commits its own DB changes) and uses messages to move the process forward.
If something fails, the SAGA triggers compensating actions to “undo” or offset earlier steps.
What a SAGA does (in one sentence)
A SAGA ensures a multi-step distributed workflow reaches a consistent outcome by coordinating steps and executing compensations when needed.
Why SAGA exists
Distributed transactions (2PC / XA) are usually avoided because they:
- reduce availability
- are complex to operate
- couple services tightly
- don’t play nicely with message brokers and retries
SAGAs are the practical alternative for real-world microservice systems.
Two common SAGA styles
1) Orchestration (central coordinator)
A dedicated Orchestrator service:
- decides what the next step is
- sends commands / publishes events
- tracks saga state
- triggers compensation on failure
Pros
- Clear control flow in one place
- Easier to reason about and test
Cons
- Orchestrator becomes a critical component
- You must design it carefully (state, idempotency, retries)
2) Choreography (event-driven, no central brain)
Services react to events and emit new events, forming a chain.
Pros
- No central service; more decentralized
Cons
- Harder to understand global flow
- Can become “spaghetti events” without strong discipline
Core SAGA concepts you must implement
1) Steps are local transactions
Each step is a normal local DB transaction inside a single service.
2) Correlation
All events/commands in a saga must include a correlation id (often called processId, sagaId, etc.) so the orchestrator can match replies to the correct saga instance.
3) Idempotency
Because messages may be delivered more than once, the orchestrator and all participants must be safe to retry:
- “same command/event again” must not corrupt state
- repeated messages should be ignored or treated as already done
4) Compensation
For each step that can’t be “rolled back” automatically, define a compensating action. Example: if you already issued a ticket, compensation might be “refund/cancel ticket”.
Simple example
Use case: Auto-refund when entry fails for a system reason
Story
- User purchases a ticket.
- User scans ticket at gate.
- Gate denies entry due to a system reason (e.g., scanner service had partial outage, policy misconfiguration).
- Orchestrator triggers a refund in Ticket Service.
- User is notified.
Event/command flow (concept)
sequenceDiagram participant TS as Ticket Service participant ACS as Access Control participant ORCH as Orchestrator participant MQ as Broker TS->>MQ: TicketPurchased (processId) MQ->>ORCH: TicketPurchased ORCH->>MQ: StartEntrySaga (processId) ACS->>MQ: EntryDenied (processId, reason=SYSTEM) MQ->>ORCH: EntryDenied ORCH->>MQ: RefundTicketCommand (processId, ticketId) TS->>MQ: TicketRefunded (processId) MQ->>ORCH: TicketRefunded ORCH->>MQ: SagaCompleted (processId)
Minimal C# example (illustrative orchestrator logic)
This snippet shows the shape of an orchestrator as a state machine. It is not a full implementation guide - you have to connect this idea to your messaging + persistence. This is pseudo-code - you need to implement this correctly in C#.
// =======================================================
// REFUND SAGA (Orchestrated SAGA)
//
// Scenario:
// - Ticket was purchased
// - User tries to enter the festival
// - If entry is denied due to a SYSTEM reason -> refund ticket
// - Otherwise, saga ends as Failed (or could end with no refund)
// =======================================================
// 1) The saga is a small state machine.
// Each saga instance lives across multiple messages/events.
enum RefundSagaState
{
Started, // Saga exists but we haven't started waiting for entry yet
WaitingForEntryResult, // Ticket bought; waiting to learn if entry was granted/denied
Refunding, // We decided to refund and are waiting for confirmation
Completed, // Everything finished successfully
Failed // Saga ended in a failure scenario (no compensation or unresolved)
}
// 2) Saga instance = the "memory" of the orchestrator for ONE processId.
// It must be stored somewhere durable (DB recommended).
class RefundSagaInstance
{
string ProcessId; // Correlation ID: ties all related messages together
string TicketId; // Needed so we can tell TicketService what to refund
RefundSagaState State; // Current step in the workflow
// Idempotency helper:
// In distributed systems, messages can be delivered more than once.
// If we process duplicates, we might refund twice -> bad.
// So we record event IDs that we already handled.
Set<string> ProcessedEventIds;
}
// 3) The orchestrator is just an event handler + a state machine.
// It reacts to incoming events and publishes commands/events.
class RefundSagaOrchestrator
{
ISagaStore store; // Loads/saves saga instances (DB, Redis, etc.)
IMessagePublisher publisher; // Publishes outgoing messages (commands/events)
// -------------------------------------------------------
// Event handler: TicketPurchased
//
// Meaning:
// The user successfully purchased a ticket.
//
// Our job:
// Create/initialize saga state and wait for the entry outcome.
// -------------------------------------------------------
async Task OnTicketPurchased(eventId, processId, ticketId)
{
saga = await store.LoadOrCreate(processId);
// Idempotency:
// If we already processed this exact event, do nothing.
if (saga.ProcessedEventIds contains eventId)
return;
saga.ProcessedEventIds.add(eventId);
// Store data we will need later
saga.TicketId = ticketId;
// Move saga forward: we now wait for entry outcome events
saga.State = WaitingForEntryResult;
await store.Save(saga);
// Optional: publish a "SagaStarted" event for monitoring/notifications
// publisher.Publish(type="festivo.saga.started.v1", processId=processId, ...)
}
// -------------------------------------------------------
// Event handler: EntryDenied
//
// Meaning:
// The gate system denied entry for some reason.
//
// Our job:
// If denial reason is SYSTEM -> compensate by refunding the ticket.
// If denial reason is USER -> fail (no refund), or choose your own policy.
// -------------------------------------------------------
async Task OnEntryDenied(eventId, processId, reason)
{
saga = await store.Load(processId);
// If the saga does not exist, we cannot correlate this event.
// In real systems you would log this and possibly alert.
if (saga == null)
return;
// Idempotency: ignore duplicates
if (saga.ProcessedEventIds contains eventId)
return;
saga.ProcessedEventIds.add(eventId);
// Guard against out-of-order events:
// If we are not waiting for entry results anymore, ignore/record for debugging.
if (saga.State != WaitingForEntryResult)
return;
// Business decision:
// Only compensate (refund) if the denial was due to a SYSTEM issue.
if (reason == "SYSTEM")
{
saga.State = Refunding;
await store.Save(saga);
// We do NOT call TicketService directly here.
// We publish a command message so the system stays loosely coupled.
await publisher.Publish(
type: "festivo.ticket.refund.requested.v1",
data: {
processId: processId,
ticketId: saga.TicketId
}
);
// Now we wait for TicketRefunded to come back later.
}
else
{
// Example: USER reason could be "ticket already used", "invalid ticket", etc.
// We end the saga without compensation.
saga.State = Failed;
await store.Save(saga);
// Optional: publish "SagaFailed" so UI can show a clear outcome
}
}
// -------------------------------------------------------
// Event handler: TicketRefunded
//
// Meaning:
// TicketService confirms it refunded the ticket.
//
// Our job:
// Mark saga as Completed.
// -------------------------------------------------------
async Task OnTicketRefunded(eventId, processId)
{
saga = await store.Load(processId);
if (saga == null)
return;
// Idempotency: ignore duplicates
if (saga.ProcessedEventIds contains eventId)
return;
saga.ProcessedEventIds.add(eventId);
// Only accept this event if we're currently waiting for it
if (saga.State != Refunding)
return;
saga.State = Completed;
await store.Save(saga);
// Publish a final status event (useful for NotificationService / UI)
await publisher.Publish(
type: "festivo.saga.completed.v1",
data: {
processId: processId
}
);
}
}What this example highlights
- SAGA instances are state machines
- Every message must be correlated with
processId - Idempotency is required (
ProcessedEventIdsconceptually) - The orchestrator triggers compensating actions (e.g. refund)
Practical checklist (what you should enforce in your implementation)
- All saga-related messages carry a
processId - Orchestrator persists state (recommended)
- Orchestrator is idempotent (duplicate events are safe)
- Participants handle duplicate commands safely
- Compensation is implemented and observable (logs + notifications)
What you should be able to answer after reading this
- Why can’t we just use a normal transaction across microservices?
- What’s the difference between orchestration and choreography?
- What does “compensation” mean in a SAGA?
- How do correlation ids and idempotency prevent chaos under retries?