Distributed Systems Exercise: Smart Concert / Festival Management Platform

Course focus: distributed systems, message-oriented middleware, reliable messaging, error handling, and long-running workflows (SAGA).
Tech focus: .NET backend microservices, RabbitMQ, CloudEvents message format, REST APIs, YARP reverse proxy, SignalR notifications, databases, transactional outbox, orchestration, Docker Compose.

You will receive an empty starter repository via GitHub Classroom. Your job is to build the system in milestones.
You must commit regularly and create a Git tag at the end of each milestone:

  • MS1-SetUp
  • MS2-AddMessagingMiddleware
  • MS3-DefineMessageFormat
  • MS4-BuildBasicWebApp
  • MS5-AddYARPReverseProxy
  • MS6-BuildBusinessLogicAndMessaging
  • MS7-AddSignalRNotifications
  • MS8-AddDatabasesForStorageAPIs
  • MS9-AddTransactionalOutbox
  • MS10-AddOrchestratorSaga
  • MS11-AddCallbackService
  • MS12-ContainerizeDockerCompose
  • MS13-OptionalExtensions

Important: This document tells you what to build and why (goals & constraints). It does not tell you how to implement it.
You are expected to research APIs and libraries yourself.


1. Scenario: “Festivo” — a Smart Festival Platform

Your team is building Festivo, a backend-heavy system for a medium-sized music festival.

The festival has:

  • Ticket purchases (simple ticket types)
  • Entry scanning at gates
  • Stage schedules and artist lineup
  • Live capacity tracking per area/stage
  • Push notifications to visitors (status updates)

The system must be distributed: multiple services, each owning its data and exposing a REST API.
Services coordinate via asynchronous events using a message broker (RabbitMQ), and implement reliability patterns such as outbox and sagas.

Key qualities we want to practice

  • Asynchronous communication via events
  • Eventual consistency between services
  • Reliable message delivery (publish/consume, retries, dead-letter handling)
  • Long-running processes with SAGA orchestration
  • Observability (optional extension: distributed logging/tracing)

2. High-Level Domain Model (Concepts)

Entities / concepts (minimum set):

  • Ticket: created when purchased, can be validated and used for entry
  • Visitor: optional (may be anonymous), linked to ticket
  • Gate Scan: attempts to enter/exit (may repeat)
  • Venue Area/Stage: has capacity and current occupancy
  • Schedule Item: artist, stage, start/end time
  • Alert/Notification: messages to users (e.g., “Stage A crowded”)

3. Services Overview

You will implement multiple services. Each service must be a separate project in the same solution (monorepo), with its own API, data store, and background workers where needed.

3.1 API Gateway (YARP)

A reverse proxy that exposes a single entry point to the frontend and routes requests to backend services.

Responsibilities

  • Route frontend requests to backend REST APIs
  • Provide a stable base URL for the web app

3.2 Ticket Service

Manages ticket purchasing and ticket state.

Owns

  • Ticket records, ticket type, status

REST API (minimum)

  • POST /tickets/purchase — purchase a ticket (returns ticket id + a “ticket code” string)
  • GET /tickets/{ticketId} — read ticket state
  • POST /tickets/{ticketId}/refund — request refund/cancel (used by SAGA compensation)

Publishes events

  • TicketPurchased
  • TicketRefunded (or TicketCancelled)

Consumes events

  • (Optional) events that might affect ticket validity (e.g., EntryDenied)

3.3 Access Control Service (Gate Scanning)

Simulates scanning tickets at entry/exit gates and enforces rules like “no double entry”.

Owns

  • Scan log and current “inside/outside” state per ticket

REST API (minimum)

  • POST /gates/scan-entry — scan ticket code for entry attempt
  • POST /gates/scan-exit — scan ticket code for exit attempt
  • GET /gates/tickets/{ticketId}/status — whether ticket is inside/outside

Publishes events

  • EntryRequested (entry attempt)
  • EntryGranted / EntryDenied
  • ExitGranted / ExitDenied

Consumes events

  • TicketPurchased (so it can recognize a valid ticket code)
  • TicketRefunded (so it can reject entry)

Why: Gate scanning is a great source of duplicate messages and out-of-order events.
You must design idempotent handling and clear rejection reasons.


3.4 Schedule Service

Stores the lineup and stage schedule.

Owns

  • Schedule items (artist, stage, time range)

REST API (minimum)

  • POST /schedule/items — create schedule item
  • GET /schedule/stages/{stageId} — read schedule for a stage
  • GET /schedule/items/{itemId} — read one schedule item

Publishes events

  • ScheduleItemCreated (optional, can drive notifications)

Consumes events

  • none required

Why: This service gives you stable “reference data” and a reason for cross-service lookups later.


3.5 Crowd Monitor Service

Tracks occupancy for each stage/area and raises alerts when near/over capacity.

Owns

  • Occupancy counters per stage/area, plus alert state

REST API (minimum)

  • GET /crowd/stages/{stageId} — occupancy + capacity status
  • POST /crowd/stages/{stageId}/configure — set capacity thresholds (admin)

Publishes events

  • OccupancyUpdated
  • CapacityWarningIssued
  • CapacityCriticalIssued
  • CapacityBackToNormal (optional)

Consumes events

  • EntryGranted / ExitGranted (to update occupancy)
  • (Optional) schedule events (to focus on “active stage”)

Why: Occupancy is a classic eventually consistent value that is updated from other services’ events.


3.6 Notification Service

Sends status updates to clients (via SignalR) and can store a history of notifications.

Owns

  • Notification history (optional but recommended)

REST API (minimum)

  • GET /notifications/recent?limit=... — list recent notifications (for new clients)

SignalR Hub (minimum)

  • /hubs/notifications — pushes status events to clients

Publishes events

  • none required

Consumes events

  • TicketPurchased, EntryGranted, EntryDenied, CapacityWarningIssued, etc.

Why: Real-time updates make event-driven systems visible and debuggable.


3.7 Orchestrator Service (SAGA)

Coordinates a long-running business process and implements compensating actions.

You will implement a SAGA that handles at least one of these workflows:

  • Ticket Refund on Failed Entry
    Example: user buys ticket, tries to enter, entry fails for a “system reason” → refund ticket automatically.
  • “Crowd-safe routing”
    Example: entry granted triggers occupancy, if capacity critical then orchestrator triggers an alert and optionally blocks entry for a short time window.
  • “VIP upgrade”
    Example: visitor requests upgrade, payment confirmed, ticket updated, notification sent.

Owns

  • Saga state instances (e.g., EntrySaga, RefundSaga)

REST API (minimum)

  • POST /processes/start — start a process (or used indirectly by events)

Publishes events

  • process commands/events to other services (you decide naming)

Consumes events

  • depends on your chosen saga workflow

Why: SAGA is the standard way to handle distributed transactions without a shared database transaction.


3.8 Callback Service (REST or gRPC)

A small service that enables downstream services to request info from another service at runtime.

Goal: demonstrate that event-driven systems still sometimes need request/response for missing data.

Example use cases:

  • Crowd Monitor needs stage capacity configuration from Schedule Service (or a dedicated Config service)
  • Access Control needs to check ticket validity with Ticket Service if it misses an event
  • Orchestrator fetches schedule information to include in user notification

Requirements

  • Implement at least one cross-service synchronous call pattern:
    • REST: simple HTTP call
    • or gRPC: typed contract

Why: Pure event-driven architectures are rare. You need to understand when and how to do synchronous calls safely.


4. System Structure Diagram

flowchart LR
  UI[Blazor Web App] -->|HTTP via Gateway| GW[YARP API Gateway]

  GW --> TS[Ticket Service]
  GW --> ACS[Access Control Service]
  GW --> SS[Schedule Service]
  GW --> CMS[Crowd Monitor Service]
  GW --> NS[Notification Service]
  GW --> ORCH[Orchestrator Service]

  subgraph Broker[RabbitMQ]
    MQ[(Exchange/Queues)]
  end

  TS <-->|events| MQ
  ACS <-->|events| MQ
  SS <-->|events| MQ
  CMS <-->|events| MQ
  NS <-->|events| MQ
  ORCH <-->|events| MQ

  NS -->|SignalR| UI

  ORCH -. sync call .-> CBS[Callback Service]
  ACS -. sync call .-> CBS
  CMS -. sync call .-> CBS
  CBS -.-> TS
  CBS -.-> SS

5. Main Data Flow (Example)

Example: Ticket purchase → entry scan → occupancy update → warning notification

sequenceDiagram
  participant UI as Blazor UI
  participant GW as YARP Gateway
  participant TS as Ticket Service
  participant MQ as RabbitMQ
  participant ACS as Access Control
  participant CMS as Crowd Monitor
  participant NS as Notification Service

  UI->>GW: POST /tickets/purchase
  GW->>TS: POST /tickets/purchase
  TS-->>UI: ticketId + ticketCode
  TS->>MQ: TicketPurchased (CloudEvent)

  UI->>GW: POST /gates/scan-entry (ticketCode)
  GW->>ACS: POST /gates/scan-entry
  ACS->>MQ: EntryRequested (CloudEvent)
  ACS->>MQ: EntryGranted OR EntryDenied (CloudEvent)

  MQ->>CMS: EntryGranted
  CMS->>MQ: OccupancyUpdated
  CMS->>MQ: CapacityWarningIssued (if threshold exceeded)

  MQ->>NS: TicketPurchased / EntryGranted / CapacityWarningIssued
  NS-->>UI: SignalR push updates

6. Messaging Standard: CloudEvents

All messages that travel through RabbitMQ must use a uniform event envelope using the CloudEvents standard.

Why:

  • Consistent metadata across services (event type, id, time, source)
  • Easier troubleshooting and filtering
  • Interoperable across languages and systems

What to include

  • id (unique per event)
  • type (event type name)
  • source (service name)
  • time
  • subject (optional: entity id)
  • datacontenttype
  • data (your event payload)

Where to read

  • CloudEvents specification (CNCF)
  • CloudEvents .NET libraries (if you choose to use them)

You decide how your event data is shaped, but it must be versionable and documented.


7. Milestones and Requirements

Each milestone must end with:

  1. all tests/build passing (if you have tests)
  2. working demo for the milestone’s scope
  3. a Git tag on the commit: git tag MSx-...

MS1 — SetUp

Tag: MS1-SetUp

Goal (why): Create a clean, repeatable starting point for a multi-service system.

Requirements

  • Create a single .sln containing individual projects:
    • ApiGateway (empty for now) Project-Type: Web API
    • TicketService Project-Type: Web API
    • AccessControlService Project-Type: Web API
    • ScheduleService Project-Type: Web API
    • CrowdMonitorService Project-Type: Web API
    • NotificationService Project-Type: Web API
    • OrchestratorService Project-Type: Web API
    • CallbackService Project-Type: Web API
    • WebApp Project-Type: Blazor WASM
    • Shared (contracts/helpers; keep minimal and avoid tight coupling) Project-Type: Class Library
  • Each service must:
    • Run as an independent process
    • Expose a GET /health endpoint returning a simple OK response
  • Repository structure must include:
    • docs/ folder (you may keep notes, event definitions, etc.)
    • README.md with how to run services (basic)
  • Git requirements:
    • At least 5 commits showing incremental work
    • Tag MS1-SetUp on the final milestone commit

MS2 — AddMessagingMiddleware (RabbitMQ)

Tag: MS2-AddMessagingMiddleware

Goal (why): Introduce asynchronous communication and decouple services.

Requirements

  • Have RabbitMQ Docker container running and available during development.
  • Add RabbitMQ connectivity to all backend services (not the WebApp).
  • Define a standard configuration approach (e.g., environment variables / appsettings).
  • Each service must be able to:
    • Publish a test message on startup (or via a test endpoint)
    • Consume a test message and log that it was received
  • Define exchanges/queues in a consistent way:
    • Either one exchange with routing keys, or per-service exchanges (your choice)
    • Make sure to have necessary exchanges and queues available (e.g. each service creates own infrastructure on startup, or centralized initialization logic that needs to run before any other service runs, or …)
  • Error handling requirement:
    • If a consumer fails to process a message, the failure must be visible (log)
    • Messages must not be silently lost
    • No message must be lost. If it can’t be processed it should go to a dead-letter queue.

MS3 — DefineMessageFormat (CloudEvents + serialization)

Tag: MS3-DefineMessageFormat

Goal (why): Ensure a uniform event format and consistent serialization across the system.

Requirements

  • All messages published to RabbitMQ must be wrapped as CloudEvents.
  • Define:
    • How you serialize CloudEvents (e.g., JSON)
    • How you handle event type mapping to .NET classes
    • How you version your event payloads (at least a documented strategy)
  • Create a contracts documentation file in docs/:
    • List each event type name
    • Describe its data schema (fields + meaning)
    • Define producer(s) and consumer(s)
    • Provide diagram showing emitted and consumed events for each service.

MS4 — BuildBasicWebApp

Tag: MS4-BuildBasicWebApp

Goal (why): Provide a simple user entry point and a way to observe system behavior later.

Requirements

  • Create a minimal Blazor WASM Web App with pages:
    • Purchase Ticket
    • Scan Entry/Exit
    • Live Status (placeholder for now)
  • The web app must call backend APIs directly (temporary) OR show placeholders.
  • UI requirements:
    • Keep it simple; functionality > styling
    • Must display returned IDs/codes clearly for testing

MS5 — AddYARPReverseProxy

Tag: MS5-AddYARPReverseProxy

Goal (why): Centralize access and avoid the frontend needing to know service URLs.

Requirements

  • Implement YARP gateway project.
  • WebApp must call only the gateway, not services directly.
  • Gateway must route to:
    • TicketService endpoints
    • AccessControlService endpoints
    • ScheduleService endpoints
    • CrowdMonitorService endpoints
    • NotificationService endpoints (REST endpoints; SignalR later)
  • Gateway must provide SSL/TLS termination. Frontend to YARP communication uses encryption (HTTPS), YARP forwards to services using unencrypted messages (HTTP)
  • Add documentation in README.md:
    • which routes exist
    • how to run gateway + services locally

MS6 — BuildBasicBusinessLogicAndMessaging

Tag: MS6-BuildBusinessLogicAndMessaging

Goal (why): Build the first real event-driven workflow with clear states and outcomes.

Functional workflow (minimum)

  1. Purchase ticket in Ticket Service
  2. Ticket Service publishes TicketPurchased
  3. Access Control consumes TicketPurchased and registers the ticket code
  4. Entry scan triggers EntryRequested and results in EntryGranted or EntryDenied
  5. Crowd Monitor consumes EntryGranted / ExitGranted and updates occupancy

Requirements

  • TicketService:
    • Must generate a ticket code (string) returned to client
    • Must store ticket state in memory for now (DB later)
  • AccessControlService:
    • Must reject unknown/invalid/refunded tickets
    • Must enforce “no double entry” (enter twice without exit → denied)
    • Must write scan decisions with a reason
  • CrowdMonitorService:
    • Must track occupancy per stage/area (choose a simple model)
    • Must publish OccupancyUpdated when occupancy changes
  • All inter-service updates must happen via RabbitMQ events (not direct calls)

Reliability requirements

  • Consumers must be idempotent for at least one event type (document which and how you ensure it).
  • Add a “poison message” strategy:
    • messages that repeatedly fail must end up somewhere observable (e.g., dead-letter queue). This should already be considered in MS2.

MS7 — AddSignalRForStatusUpdateNotifications

Tag: MS7-AddSignalRNotifications

Goal (why): Make the asynchronous system visible to users and developers in real time.

Requirements

  • NotificationService hosts a SignalR hub at /hubs/notifications.
  • NotificationService consumes at least these events and broadcasts updates:
    • TicketPurchased
    • EntryGranted / EntryDenied
    • OccupancyUpdated
    • (If implemented) CapacityWarningIssued
  • WebApp connects to the hub and shows a live event feed:
    • timestamp
    • event type
    • short description (human-readable)
  • Implement network communication in one of two ways (document what you chose and why):
    • Frontend directly connects to NotificationService (not using YARP API Gateway)
    • or Gateway must support routing for SignalR (WebSockets) to NotificationService.

MS8 — AddDatabase(s) For Storage-First APIs

Tag: MS8-AddDatabasesForStorageAPIs

Goal (why): Introduce persistence and independent data ownership per service.

Requirements

  • Add a database for at least:
    • TicketService (tickets)
    • AccessControlService (scan log / inside status)
    • OrchestratorService (saga state) can be DB later; optional in this milestone
  • You can add individual databases (or database containers) or use a single database instance with multiple schemas.
  • Each service must have it’s own database user that only has access to its own database instance / schema.
  • Define clear ownership:
    • Each service has its own schema/database (no shared tables)
  • APIs must read/write from the database (not in-memory).
  • Provide a minimal migration strategy (documented).

Keep schemas small and straightforward. The focus is distributed behavior, not data modeling perfection.


MS9 — AddTransactionalOutboxPattern

Tag: MS9-AddTransactionalOutbox

Goal (why): Ensure messages are not lost when a service writes to DB but fails before publishing.

Requirements

  • Implement an outbox table in at least one service (TicketService strongly recommended).
  • When the service changes state in its DB, the outgoing message must be recorded in the outbox within the same local DB transaction.
  • A background publisher (worker) must read the outbox and publish messages to RabbitMQ.
  • Outbox messages must be marked as sent (or deleted) only after successful publish.
  • Document:
    • how duplicates are prevented/handled
    • what happens if RabbitMQ is down

MS10 — AddOrchestratorServiceForSAGAImplementation

Tag: MS10-AddOrchestratorSaga

Goal (why): Coordinate a multi-step process across services with compensation for failures.

Information on SAGA: SAGA Pattern

Required SAGA workflow (choose ONE and implement fully)

Option A: Auto-refund on entry failure (recommended)

  • When a ticket is purchased, a saga instance is created.
  • The user attempts entry:
    • If EntryGranted → saga completes
    • If EntryDenied for a system reason (define a reason category) → orchestrator triggers refund via TicketService
  • TicketService publishes TicketRefunded, which completes the saga.

Option B: Capacity-based entry throttling

  • If CapacityCriticalIssued, orchestrator sends a command to Access Control to deny further entries for a time window.

Requirements

  • Orchestrator must store saga state (in memory is acceptable initially, DB preferred).
  • Orchestrator must correlate messages to saga instances (define a correlation id strategy).
  • Must implement at least one compensating action (e.g., refund).
  • Must publish saga status events for NotificationService to show progress:
    • SagaStarted, SagaStepCompleted, SagaCompensated, SagaCompleted, SagaFailed (names can vary)

MS11 — AddCallbackService (REST or gRPC)

Tag: MS11-AddCallbackService

Goal (why): Demonstrate safe synchronous calls between services for missing context.

Requirements

  • Implement CallbackService as a dedicated “facade” that performs at least one of these:
    • Provide TicketService ticket validity details to AccessControlService (fallback check)
    • Provide ScheduleService stage/capacity config details to CrowdMonitorService
    • Provide enriched info to Orchestrator (e.g., stage name, artist name) for notifications
  • Must use either REST or gRPC (your choice).
  • Must include:
    • timeouts
    • failure handling (what if the call fails?)
    • minimal caching allowed but must be documented

MS12 — Containerize Whole Application (Docker + Compose)

Tag: MS12-ContainerizeDockerCompose

Goal (why): Make the system runnable the same way on any machine.

Requirements

  • Each service and the web app must have a Dockerfile.
  • Provide docker-compose.yml that starts:
    • RabbitMQ
    • Databases (as needed)
    • All services
    • Gateway
    • Web app
  • Compose must expose:
    • Web app URL
    • RabbitMQ management UI (optional but helpful)
  • Provide README.md steps:
    • how to run with Docker Compose
    • how to verify it works (a small test scenario)

MS13 — Optional Extensions

Tag: MS13-OptionalExtensions

Choose at least one (or more):

A) Distributed logging / tracing

  • Add structured logging with correlation ids
  • Optionally add OpenTelemetry tracing and a collector

B) Improved error handling

  • Retry policies (with backoff) for consumers
  • Better DLQ inspection endpoints or dashboards

C) Frontend styling

  • Make the UI look like a real festival app (simple but coherent)

D) Admin tools

  • Add admin pages to configure stage capacities or schedule items

8. Non-Functional Requirements (All Milestones)

Version control & discipline

  • Use Git from day one.
  • Commit messages must be meaningful.
  • Tag exactly at the end of each milestone.

Service boundaries

  • No “shared database”.
  • Avoid sharing domain models directly between services.
  • Shared project should contain only:
    • minimal event contracts (or event names)
    • shared serialization helpers
    • common small utilities (e.g., correlation id helpers)

Observability (minimum)

  • Each service logs:
    • when it publishes an event (type + id + correlation id)
    • when it consumes an event (type + id + outcome)
    • when it rejects a request (why)

Reliability mindset

  • Assume:
    • messages can be delivered more than once
    • messages can arrive out of order
    • a service can be down temporarily
    • a publish can fail
  • Design behaviors that make the system stable under these conditions.

9. Definition of Done (System Demo)

At the end (MS12 or MS13), you must be able to demonstrate this scenario:

  1. Open the WebApp
  2. Purchase a ticket
  3. Scan entry (granted)
  4. Observe live notifications and occupancy updates
  5. Trigger at least one failure path (entry denied / refund saga / capacity warning)
  6. Show that messages are still delivered reliably (e.g., via outbox or retry/DLQ behavior)
  7. Run the entire system via docker compose up

10. Deliverables Checklist

  • Working code with all milestones tagged
  • docs/ includes event definitions and correlation strategy
  • README.md includes run instructions (local + Docker Compose)
  • System demonstrates asynchronous messaging + reliability patterns

Quick note on naming

You may rename services, endpoints, and event names.
However, you must keep the same architectural responsibilities and milestone outcomes.