Distributed Systems Exercise: Smart Concert / Festival Management Platform
Course focus: distributed systems, message-oriented middleware, reliable messaging, error handling, and long-running workflows (SAGA).
Tech focus: .NET backend microservices, RabbitMQ, CloudEvents message format, REST APIs, YARP reverse proxy, SignalR notifications, databases, transactional outbox, orchestration, Docker Compose.
You will receive an empty starter repository via GitHub Classroom. Your job is to build the system in milestones.
You must commit regularly and create a Git tag at the end of each milestone:
MS1-SetUpMS2-AddMessagingMiddlewareMS3-DefineMessageFormatMS4-BuildBasicWebAppMS5-AddYARPReverseProxyMS6-BuildBusinessLogicAndMessagingMS7-AddSignalRNotificationsMS8-AddDatabasesForStorageAPIsMS9-AddTransactionalOutboxMS10-AddOrchestratorSagaMS11-AddCallbackServiceMS12-ContainerizeDockerComposeMS13-OptionalExtensions
Important: This document tells you what to build and why (goals & constraints). It does not tell you how to implement it.
You are expected to research APIs and libraries yourself.
1. Scenario: “Festivo” — a Smart Festival Platform
Your team is building Festivo, a backend-heavy system for a medium-sized music festival.
The festival has:
- Ticket purchases (simple ticket types)
- Entry scanning at gates
- Stage schedules and artist lineup
- Live capacity tracking per area/stage
- Push notifications to visitors (status updates)
The system must be distributed: multiple services, each owning its data and exposing a REST API.
Services coordinate via asynchronous events using a message broker (RabbitMQ), and implement reliability patterns such as outbox and sagas.
Key qualities we want to practice
- Asynchronous communication via events
- Eventual consistency between services
- Reliable message delivery (publish/consume, retries, dead-letter handling)
- Long-running processes with SAGA orchestration
- Observability (optional extension: distributed logging/tracing)
2. High-Level Domain Model (Concepts)
Entities / concepts (minimum set):
- Ticket: created when purchased, can be validated and used for entry
- Visitor: optional (may be anonymous), linked to ticket
- Gate Scan: attempts to enter/exit (may repeat)
- Venue Area/Stage: has capacity and current occupancy
- Schedule Item: artist, stage, start/end time
- Alert/Notification: messages to users (e.g., “Stage A crowded”)
3. Services Overview
You will implement multiple services. Each service must be a separate project in the same solution (monorepo), with its own API, data store, and background workers where needed.
3.1 API Gateway (YARP)
A reverse proxy that exposes a single entry point to the frontend and routes requests to backend services.
Responsibilities
- Route frontend requests to backend REST APIs
- Provide a stable base URL for the web app
3.2 Ticket Service
Manages ticket purchasing and ticket state.
Owns
- Ticket records, ticket type, status
REST API (minimum)
POST /tickets/purchase— purchase a ticket (returns ticket id + a “ticket code” string)GET /tickets/{ticketId}— read ticket statePOST /tickets/{ticketId}/refund— request refund/cancel (used by SAGA compensation)
Publishes events
TicketPurchasedTicketRefunded(orTicketCancelled)
Consumes events
- (Optional) events that might affect ticket validity (e.g.,
EntryDenied)
3.3 Access Control Service (Gate Scanning)
Simulates scanning tickets at entry/exit gates and enforces rules like “no double entry”.
Owns
- Scan log and current “inside/outside” state per ticket
REST API (minimum)
POST /gates/scan-entry— scan ticket code for entry attemptPOST /gates/scan-exit— scan ticket code for exit attemptGET /gates/tickets/{ticketId}/status— whether ticket is inside/outside
Publishes events
EntryRequested(entry attempt)EntryGranted/EntryDeniedExitGranted/ExitDenied
Consumes events
TicketPurchased(so it can recognize a valid ticket code)TicketRefunded(so it can reject entry)
Why: Gate scanning is a great source of duplicate messages and out-of-order events.
You must design idempotent handling and clear rejection reasons.
3.4 Schedule Service
Stores the lineup and stage schedule.
Owns
- Schedule items (artist, stage, time range)
REST API (minimum)
POST /schedule/items— create schedule itemGET /schedule/stages/{stageId}— read schedule for a stageGET /schedule/items/{itemId}— read one schedule item
Publishes events
ScheduleItemCreated(optional, can drive notifications)
Consumes events
- none required
Why: This service gives you stable “reference data” and a reason for cross-service lookups later.
3.5 Crowd Monitor Service
Tracks occupancy for each stage/area and raises alerts when near/over capacity.
Owns
- Occupancy counters per stage/area, plus alert state
REST API (minimum)
GET /crowd/stages/{stageId}— occupancy + capacity statusPOST /crowd/stages/{stageId}/configure— set capacity thresholds (admin)
Publishes events
OccupancyUpdatedCapacityWarningIssuedCapacityCriticalIssuedCapacityBackToNormal(optional)
Consumes events
EntryGranted/ExitGranted(to update occupancy)- (Optional) schedule events (to focus on “active stage”)
Why: Occupancy is a classic eventually consistent value that is updated from other services’ events.
3.6 Notification Service
Sends status updates to clients (via SignalR) and can store a history of notifications.
Owns
- Notification history (optional but recommended)
REST API (minimum)
GET /notifications/recent?limit=...— list recent notifications (for new clients)
SignalR Hub (minimum)
/hubs/notifications— pushes status events to clients
Publishes events
- none required
Consumes events
TicketPurchased,EntryGranted,EntryDenied,CapacityWarningIssued, etc.
Why: Real-time updates make event-driven systems visible and debuggable.
3.7 Orchestrator Service (SAGA)
Coordinates a long-running business process and implements compensating actions.
You will implement a SAGA that handles at least one of these workflows:
- Ticket Refund on Failed Entry
Example: user buys ticket, tries to enter, entry fails for a “system reason” → refund ticket automatically. - “Crowd-safe routing”
Example: entry granted triggers occupancy, if capacity critical then orchestrator triggers an alert and optionally blocks entry for a short time window. - “VIP upgrade”
Example: visitor requests upgrade, payment confirmed, ticket updated, notification sent.
Owns
- Saga state instances (e.g.,
EntrySaga,RefundSaga)
REST API (minimum)
POST /processes/start— start a process (or used indirectly by events)
Publishes events
- process commands/events to other services (you decide naming)
Consumes events
- depends on your chosen saga workflow
Why: SAGA is the standard way to handle distributed transactions without a shared database transaction.
3.8 Callback Service (REST or gRPC)
A small service that enables downstream services to request info from another service at runtime.
Goal: demonstrate that event-driven systems still sometimes need request/response for missing data.
Example use cases:
- Crowd Monitor needs stage capacity configuration from Schedule Service (or a dedicated Config service)
- Access Control needs to check ticket validity with Ticket Service if it misses an event
- Orchestrator fetches schedule information to include in user notification
Requirements
- Implement at least one cross-service synchronous call pattern:
- REST: simple HTTP call
- or gRPC: typed contract
Why: Pure event-driven architectures are rare. You need to understand when and how to do synchronous calls safely.
4. System Structure Diagram
flowchart LR UI[Blazor Web App] -->|HTTP via Gateway| GW[YARP API Gateway] GW --> TS[Ticket Service] GW --> ACS[Access Control Service] GW --> SS[Schedule Service] GW --> CMS[Crowd Monitor Service] GW --> NS[Notification Service] GW --> ORCH[Orchestrator Service] subgraph Broker[RabbitMQ] MQ[(Exchange/Queues)] end TS <-->|events| MQ ACS <-->|events| MQ SS <-->|events| MQ CMS <-->|events| MQ NS <-->|events| MQ ORCH <-->|events| MQ NS -->|SignalR| UI ORCH -. sync call .-> CBS[Callback Service] ACS -. sync call .-> CBS CMS -. sync call .-> CBS CBS -.-> TS CBS -.-> SS
5. Main Data Flow (Example)
Example: Ticket purchase → entry scan → occupancy update → warning notification
sequenceDiagram participant UI as Blazor UI participant GW as YARP Gateway participant TS as Ticket Service participant MQ as RabbitMQ participant ACS as Access Control participant CMS as Crowd Monitor participant NS as Notification Service UI->>GW: POST /tickets/purchase GW->>TS: POST /tickets/purchase TS-->>UI: ticketId + ticketCode TS->>MQ: TicketPurchased (CloudEvent) UI->>GW: POST /gates/scan-entry (ticketCode) GW->>ACS: POST /gates/scan-entry ACS->>MQ: EntryRequested (CloudEvent) ACS->>MQ: EntryGranted OR EntryDenied (CloudEvent) MQ->>CMS: EntryGranted CMS->>MQ: OccupancyUpdated CMS->>MQ: CapacityWarningIssued (if threshold exceeded) MQ->>NS: TicketPurchased / EntryGranted / CapacityWarningIssued NS-->>UI: SignalR push updates
6. Messaging Standard: CloudEvents
All messages that travel through RabbitMQ must use a uniform event envelope using the CloudEvents standard.
Why:
- Consistent metadata across services (event type, id, time, source)
- Easier troubleshooting and filtering
- Interoperable across languages and systems
What to include
id(unique per event)type(event type name)source(service name)timesubject(optional: entity id)datacontenttypedata(your event payload)
Where to read
- CloudEvents specification (CNCF)
- CloudEvents .NET libraries (if you choose to use them)
You decide how your event
datais shaped, but it must be versionable and documented.
7. Milestones and Requirements
Each milestone must end with:
- all tests/build passing (if you have tests)
- working demo for the milestone’s scope
- a Git tag on the commit:
git tag MSx-...
MS1 — SetUp
Tag: MS1-SetUp
Goal (why): Create a clean, repeatable starting point for a multi-service system.
Requirements
- Create a single
.slncontaining individual projects:ApiGateway(empty for now) Project-Type: Web APITicketServiceProject-Type: Web APIAccessControlServiceProject-Type: Web APIScheduleServiceProject-Type: Web APICrowdMonitorServiceProject-Type: Web APINotificationServiceProject-Type: Web APIOrchestratorServiceProject-Type: Web APICallbackServiceProject-Type: Web APIWebAppProject-Type: Blazor WASMShared(contracts/helpers; keep minimal and avoid tight coupling) Project-Type: Class Library
- Each service must:
- Run as an independent process
- Expose a
GET /healthendpoint returning a simple OK response
- Repository structure must include:
docs/folder (you may keep notes, event definitions, etc.)README.mdwith how to run services (basic)
- Git requirements:
- At least 5 commits showing incremental work
- Tag
MS1-SetUpon the final milestone commit
MS2 — AddMessagingMiddleware (RabbitMQ)
Tag: MS2-AddMessagingMiddleware
Goal (why): Introduce asynchronous communication and decouple services.
Requirements
- Have RabbitMQ Docker container running and available during development.
- Add RabbitMQ connectivity to all backend services (not the WebApp).
- Define a standard configuration approach (e.g., environment variables / appsettings).
- Each service must be able to:
- Publish a test message on startup (or via a test endpoint)
- Consume a test message and log that it was received
- Define exchanges/queues in a consistent way:
- Either one exchange with routing keys, or per-service exchanges (your choice)
- Make sure to have necessary exchanges and queues available (e.g. each service creates own infrastructure on startup, or centralized initialization logic that needs to run before any other service runs, or …)
- Error handling requirement:
- If a consumer fails to process a message, the failure must be visible (log)
- Messages must not be silently lost
- No message must be lost. If it can’t be processed it should go to a dead-letter queue.
MS3 — DefineMessageFormat (CloudEvents + serialization)
Tag: MS3-DefineMessageFormat
Goal (why): Ensure a uniform event format and consistent serialization across the system.
Requirements
- All messages published to RabbitMQ must be wrapped as CloudEvents.
- Define:
- How you serialize CloudEvents (e.g., JSON)
- How you handle event
typemapping to .NET classes - How you version your event payloads (at least a documented strategy)
- Create a contracts documentation file in
docs/:- List each event type name
- Describe its
dataschema (fields + meaning) - Define producer(s) and consumer(s)
- Provide diagram showing emitted and consumed events for each service.
MS4 — BuildBasicWebApp
Tag: MS4-BuildBasicWebApp
Goal (why): Provide a simple user entry point and a way to observe system behavior later.
Requirements
- Create a minimal Blazor WASM Web App with pages:
- Purchase Ticket
- Scan Entry/Exit
- Live Status (placeholder for now)
- The web app must call backend APIs directly (temporary) OR show placeholders.
- UI requirements:
- Keep it simple; functionality > styling
- Must display returned IDs/codes clearly for testing
MS5 — AddYARPReverseProxy
Tag: MS5-AddYARPReverseProxy
Goal (why): Centralize access and avoid the frontend needing to know service URLs.
Requirements
- Implement YARP gateway project.
- WebApp must call only the gateway, not services directly.
- Gateway must route to:
- TicketService endpoints
- AccessControlService endpoints
- ScheduleService endpoints
- CrowdMonitorService endpoints
- NotificationService endpoints (REST endpoints; SignalR later)
- Gateway must provide SSL/TLS termination. Frontend to YARP communication uses encryption (HTTPS), YARP forwards to services using unencrypted messages (HTTP)
- Add documentation in
README.md:- which routes exist
- how to run gateway + services locally
MS6 — BuildBasicBusinessLogicAndMessaging
Tag: MS6-BuildBusinessLogicAndMessaging
Goal (why): Build the first real event-driven workflow with clear states and outcomes.
Functional workflow (minimum)
- Purchase ticket in Ticket Service
- Ticket Service publishes
TicketPurchased - Access Control consumes
TicketPurchasedand registers the ticket code - Entry scan triggers
EntryRequestedand results inEntryGrantedorEntryDenied - Crowd Monitor consumes
EntryGranted/ExitGrantedand updates occupancy
Requirements
- TicketService:
- Must generate a ticket code (string) returned to client
- Must store ticket state in memory for now (DB later)
- AccessControlService:
- Must reject unknown/invalid/refunded tickets
- Must enforce “no double entry” (enter twice without exit → denied)
- Must write scan decisions with a reason
- CrowdMonitorService:
- Must track occupancy per stage/area (choose a simple model)
- Must publish
OccupancyUpdatedwhen occupancy changes
- All inter-service updates must happen via RabbitMQ events (not direct calls)
Reliability requirements
- Consumers must be idempotent for at least one event type (document which and how you ensure it).
- Add a “poison message” strategy:
- messages that repeatedly fail must end up somewhere observable (e.g., dead-letter queue). This should already be considered in MS2.
MS7 — AddSignalRForStatusUpdateNotifications
Tag: MS7-AddSignalRNotifications
Goal (why): Make the asynchronous system visible to users and developers in real time.
Requirements
- NotificationService hosts a SignalR hub at
/hubs/notifications. - NotificationService consumes at least these events and broadcasts updates:
TicketPurchasedEntryGranted/EntryDeniedOccupancyUpdated- (If implemented)
CapacityWarningIssued
- WebApp connects to the hub and shows a live event feed:
- timestamp
- event type
- short description (human-readable)
- Implement network communication in one of two ways (document what you chose and why):
- Frontend directly connects to NotificationService (not using YARP API Gateway)
- or Gateway must support routing for SignalR (WebSockets) to NotificationService.
MS8 — AddDatabase(s) For Storage-First APIs
Tag: MS8-AddDatabasesForStorageAPIs
Goal (why): Introduce persistence and independent data ownership per service.
Requirements
- Add a database for at least:
- TicketService (tickets)
- AccessControlService (scan log / inside status)
- OrchestratorService (saga state) can be DB later; optional in this milestone
- You can add individual databases (or database containers) or use a single database instance with multiple schemas.
- Each service must have it’s own database user that only has access to its own database instance / schema.
- Define clear ownership:
- Each service has its own schema/database (no shared tables)
- APIs must read/write from the database (not in-memory).
- Provide a minimal migration strategy (documented).
Keep schemas small and straightforward. The focus is distributed behavior, not data modeling perfection.
MS9 — AddTransactionalOutboxPattern
Tag: MS9-AddTransactionalOutbox
Goal (why): Ensure messages are not lost when a service writes to DB but fails before publishing.
Requirements
- Implement an outbox table in at least one service (TicketService strongly recommended).
- When the service changes state in its DB, the outgoing message must be recorded in the outbox within the same local DB transaction.
- A background publisher (worker) must read the outbox and publish messages to RabbitMQ.
- Outbox messages must be marked as sent (or deleted) only after successful publish.
- Document:
- how duplicates are prevented/handled
- what happens if RabbitMQ is down
MS10 — AddOrchestratorServiceForSAGAImplementation
Tag: MS10-AddOrchestratorSaga
Goal (why): Coordinate a multi-step process across services with compensation for failures.
Information on SAGA: SAGA Pattern
Required SAGA workflow (choose ONE and implement fully)
Option A: Auto-refund on entry failure (recommended)
- When a ticket is purchased, a saga instance is created.
- The user attempts entry:
- If
EntryGranted→ saga completes - If
EntryDeniedfor a system reason (define a reason category) → orchestrator triggers refund via TicketService
- If
- TicketService publishes
TicketRefunded, which completes the saga.
Option B: Capacity-based entry throttling
- If
CapacityCriticalIssued, orchestrator sends a command to Access Control to deny further entries for a time window.
Requirements
- Orchestrator must store saga state (in memory is acceptable initially, DB preferred).
- Orchestrator must correlate messages to saga instances (define a correlation id strategy).
- Must implement at least one compensating action (e.g., refund).
- Must publish saga status events for NotificationService to show progress:
SagaStarted,SagaStepCompleted,SagaCompensated,SagaCompleted,SagaFailed(names can vary)
MS11 — AddCallbackService (REST or gRPC)
Tag: MS11-AddCallbackService
Goal (why): Demonstrate safe synchronous calls between services for missing context.
Requirements
- Implement CallbackService as a dedicated “facade” that performs at least one of these:
- Provide TicketService ticket validity details to AccessControlService (fallback check)
- Provide ScheduleService stage/capacity config details to CrowdMonitorService
- Provide enriched info to Orchestrator (e.g., stage name, artist name) for notifications
- Must use either REST or gRPC (your choice).
- Must include:
- timeouts
- failure handling (what if the call fails?)
- minimal caching allowed but must be documented
MS12 — Containerize Whole Application (Docker + Compose)
Tag: MS12-ContainerizeDockerCompose
Goal (why): Make the system runnable the same way on any machine.
Requirements
- Each service and the web app must have a
Dockerfile. - Provide
docker-compose.ymlthat starts:- RabbitMQ
- Databases (as needed)
- All services
- Gateway
- Web app
- Compose must expose:
- Web app URL
- RabbitMQ management UI (optional but helpful)
- Provide
README.mdsteps:- how to run with Docker Compose
- how to verify it works (a small test scenario)
MS13 — Optional Extensions
Tag: MS13-OptionalExtensions
Choose at least one (or more):
A) Distributed logging / tracing
- Add structured logging with correlation ids
- Optionally add OpenTelemetry tracing and a collector
B) Improved error handling
- Retry policies (with backoff) for consumers
- Better DLQ inspection endpoints or dashboards
C) Frontend styling
- Make the UI look like a real festival app (simple but coherent)
D) Admin tools
- Add admin pages to configure stage capacities or schedule items
8. Non-Functional Requirements (All Milestones)
Version control & discipline
- Use Git from day one.
- Commit messages must be meaningful.
- Tag exactly at the end of each milestone.
Service boundaries
- No “shared database”.
- Avoid sharing domain models directly between services.
- Shared project should contain only:
- minimal event contracts (or event names)
- shared serialization helpers
- common small utilities (e.g., correlation id helpers)
Observability (minimum)
- Each service logs:
- when it publishes an event (type + id + correlation id)
- when it consumes an event (type + id + outcome)
- when it rejects a request (why)
Reliability mindset
- Assume:
- messages can be delivered more than once
- messages can arrive out of order
- a service can be down temporarily
- a publish can fail
- Design behaviors that make the system stable under these conditions.
9. Definition of Done (System Demo)
At the end (MS12 or MS13), you must be able to demonstrate this scenario:
- Open the WebApp
- Purchase a ticket
- Scan entry (granted)
- Observe live notifications and occupancy updates
- Trigger at least one failure path (entry denied / refund saga / capacity warning)
- Show that messages are still delivered reliably (e.g., via outbox or retry/DLQ behavior)
- Run the entire system via
docker compose up
10. Deliverables Checklist
- Working code with all milestones tagged
-
docs/includes event definitions and correlation strategy -
README.mdincludes run instructions (local + Docker Compose) - System demonstrates asynchronous messaging + reliability patterns
Quick note on naming
You may rename services, endpoints, and event names.
However, you must keep the same architectural responsibilities and milestone outcomes.