SAGA Pattern

A SAGA is a pattern for handling long-running business processes in distributed systems without using a single distributed database transaction.

In a microservices architecture, each service owns its own data store. That means you cannot safely do:

“Update Service A DB, update Service B DB, update Service C DB — all in one ACID transaction.”

Instead, a SAGA coordinates a sequence of local transactions (each service commits its own DB changes) and uses messages to move the process forward.

If something fails, the SAGA triggers compensating actions to “undo” or offset earlier steps.

What a SAGA does (in one sentence)

A SAGA ensures a multi-step distributed workflow reaches a consistent outcome by coordinating steps and executing compensations when needed.

Why SAGA exists

Distributed transactions (2PC / XA) are usually avoided because they:

reduce availability
are complex to operate
couple services tightly
don’t play nicely with message brokers and retries

SAGAs are the practical alternative for real-world microservice systems.

Two common SAGA styles

1) Orchestration (central coordinator)

A dedicated Orchestrator service:

decides what the next step is
sends commands / publishes events
tracks saga state
triggers compensation on failure

Pros

Clear control flow in one place
Easier to reason about and test

Cons

Orchestrator becomes a critical component
You must design it carefully (state, idempotency, retries)

2) Choreography (event-driven, no central brain)

Services react to events and emit new events, forming a chain.

Pros

No central service; more decentralized

Cons

Harder to understand global flow
Can become “spaghetti events” without strong discipline

Core SAGA concepts you must implement

1) Steps are local transactions

Each step is a normal local DB transaction inside a single service.

2) Correlation

All events/commands in a saga must include a correlation id (often called processId, sagaId, etc.) so the orchestrator can match replies to the correct saga instance.

3) Idempotency

Because messages may be delivered more than once, the orchestrator and all participants must be safe to retry:

“same command/event again” must not corrupt state
repeated messages should be ignored or treated as already done

4) Compensation

For each step that can’t be “rolled back” automatically, define a compensating action. Example: if you already issued a ticket, compensation might be “refund/cancel ticket”.

Simple example

Use case: Auto-refund when entry fails for a system reason

Story

User purchases a ticket.
User scans ticket at gate.
Gate denies entry due to a system reason (e.g., scanner service had partial outage, policy misconfiguration).
Orchestrator triggers a refund in Ticket Service.
User is notified.

Event/command flow (concept)

sequenceDiagram
  participant TS as Ticket Service
  participant ACS as Access Control
  participant ORCH as Orchestrator
  participant MQ as Broker

  TS->>MQ: TicketPurchased (processId)
  MQ->>ORCH: TicketPurchased
  ORCH->>MQ: StartEntrySaga (processId)

  ACS->>MQ: EntryDenied (processId, reason=SYSTEM)
  MQ->>ORCH: EntryDenied
  ORCH->>MQ: RefundTicketCommand (processId, ticketId)

  TS->>MQ: TicketRefunded (processId)
  MQ->>ORCH: TicketRefunded
  ORCH->>MQ: SagaCompleted (processId)

Minimal C# example (illustrative orchestrator logic)

This snippet shows the shape of an orchestrator as a state machine. It is not a full implementation guide - you have to connect this idea to your messaging + persistence. This is pseudo-code - you need to implement this correctly in C#.

// =======================================================
// REFUND SAGA (Orchestrated SAGA)
//
// Scenario:
//  - Ticket was purchased
//  - User tries to enter the festival
//  - If entry is denied due to a SYSTEM reason -> refund ticket
//  - Otherwise, saga ends as Failed (or could end with no refund)
// =======================================================
 
 
// 1) The saga is a small state machine.
//    Each saga instance lives across multiple messages/events.
enum RefundSagaState
{
    Started,                // Saga exists but we haven't started waiting for entry yet
    WaitingForEntryResult,  // Ticket bought; waiting to learn if entry was granted/denied
    Refunding,              // We decided to refund and are waiting for confirmation
    Completed,              // Everything finished successfully
    Failed                  // Saga ended in a failure scenario (no compensation or unresolved)
}
 
 
// 2) Saga instance = the "memory" of the orchestrator for ONE processId.
//    It must be stored somewhere durable (DB recommended).
class RefundSagaInstance
{
    string ProcessId;             // Correlation ID: ties all related messages together
    string TicketId;              // Needed so we can tell TicketService what to refund
    RefundSagaState State;        // Current step in the workflow
 
    // Idempotency helper:
    // In distributed systems, messages can be delivered more than once.
    // If we process duplicates, we might refund twice -> bad.
    // So we record event IDs that we already handled.
    Set<string> ProcessedEventIds;
}
 
 
// 3) The orchestrator is just an event handler + a state machine.
//    It reacts to incoming events and publishes commands/events.
class RefundSagaOrchestrator
{
    ISagaStore store;             // Loads/saves saga instances (DB, Redis, etc.)
    IMessagePublisher publisher;  // Publishes outgoing messages (commands/events)
 
    // -------------------------------------------------------
    // Event handler: TicketPurchased
    //
    // Meaning:
    //   The user successfully purchased a ticket.
    //
    // Our job:
    //   Create/initialize saga state and wait for the entry outcome.
    // -------------------------------------------------------
    async Task OnTicketPurchased(eventId, processId, ticketId)
    {
        saga = await store.LoadOrCreate(processId);
 
        // Idempotency:
        // If we already processed this exact event, do nothing.
        if (saga.ProcessedEventIds contains eventId)
            return;
 
        saga.ProcessedEventIds.add(eventId);
 
        // Store data we will need later
        saga.TicketId = ticketId;
 
        // Move saga forward: we now wait for entry outcome events
        saga.State = WaitingForEntryResult;
 
        await store.Save(saga);
 
        // Optional: publish a "SagaStarted" event for monitoring/notifications
        // publisher.Publish(type="festivo.saga.started.v1", processId=processId, ...)
    }
 
 
    // -------------------------------------------------------
    // Event handler: EntryDenied
    //
    // Meaning:
    //   The gate system denied entry for some reason.
    //
    // Our job:
    //   If denial reason is SYSTEM -> compensate by refunding the ticket.
    //   If denial reason is USER -> fail (no refund), or choose your own policy.
    // -------------------------------------------------------
    async Task OnEntryDenied(eventId, processId, reason)
    {
        saga = await store.Load(processId);
 
        // If the saga does not exist, we cannot correlate this event.
        // In real systems you would log this and possibly alert.
        if (saga == null)
            return;
 
        // Idempotency: ignore duplicates
        if (saga.ProcessedEventIds contains eventId)
            return;
 
        saga.ProcessedEventIds.add(eventId);
 
        // Guard against out-of-order events:
        // If we are not waiting for entry results anymore, ignore/record for debugging.
        if (saga.State != WaitingForEntryResult)
            return;
 
        // Business decision:
        // Only compensate (refund) if the denial was due to a SYSTEM issue.
        if (reason == "SYSTEM")
        {
            saga.State = Refunding;
            await store.Save(saga);
 
            // We do NOT call TicketService directly here.
            // We publish a command message so the system stays loosely coupled.
            await publisher.Publish(
                type: "festivo.ticket.refund.requested.v1",
                data: {
                    processId: processId,
                    ticketId: saga.TicketId
                }
            );
 
            // Now we wait for TicketRefunded to come back later.
        }
        else
        {
            // Example: USER reason could be "ticket already used", "invalid ticket", etc.
            // We end the saga without compensation.
            saga.State = Failed;
            await store.Save(saga);
 
            // Optional: publish "SagaFailed" so UI can show a clear outcome
        }
    }
 
 
    // -------------------------------------------------------
    // Event handler: TicketRefunded
    //
    // Meaning:
    //   TicketService confirms it refunded the ticket.
    //
    // Our job:
    //   Mark saga as Completed.
    // -------------------------------------------------------
    async Task OnTicketRefunded(eventId, processId)
    {
        saga = await store.Load(processId);
        if (saga == null)
            return;
 
        // Idempotency: ignore duplicates
        if (saga.ProcessedEventIds contains eventId)
            return;
 
        saga.ProcessedEventIds.add(eventId);
 
        // Only accept this event if we're currently waiting for it
        if (saga.State != Refunding)
            return;
 
        saga.State = Completed;
        await store.Save(saga);
 
        // Publish a final status event (useful for NotificationService / UI)
        await publisher.Publish(
            type: "festivo.saga.completed.v1",
            data: {
                processId: processId
            }
        );
    }
}

What this example highlights

SAGA instances are state machines
Every message must be correlated with processId
Idempotency is required (ProcessedEventIds conceptually)
The orchestrator triggers compensating actions (e.g. refund)

Practical checklist (what you should enforce in your implementation)

All saga-related messages carry a processId
Orchestrator persists state (recommended)
Orchestrator is idempotent (duplicate events are safe)
Participants handle duplicate commands safely
Compensation is implemented and observable (logs + notifications)

What you should be able to answer after reading this

Why can’t we just use a normal transaction across microservices?
What’s the difference between orchestration and choreography?
What does “compensation” mean in a SAGA?
How do correlation ids and idempotency prevent chaos under retries?

Deep Thought

Explorer

SAGA Pattern

SAGA Pattern

What a SAGA does (in one sentence)

Why SAGA exists

Two common SAGA styles

1) Orchestration (central coordinator)

2) Choreography (event-driven, no central brain)

Core SAGA concepts you must implement

1) Steps are local transactions

2) Correlation

3) Idempotency

4) Compensation

Simple example

Use case: Auto-refund when entry fails for a system reason

Event/command flow (concept)

Minimal C# example (illustrative orchestrator logic)

What this example highlights

Practical checklist (what you should enforce in your implementation)

What you should be able to answer after reading this

Graph View

Table of Contents

Backlinks