Distributed Systems Design Patterns: Building Resilient Architecture at Scale
Explore proven distributed systems design patterns and learn how to implement Circuit Breaker, Saga, CQRS, and Event Sourcing patterns for building resilient, scalable backend systems.
Amr S.
Author & Developer

Distributed Systems Design Patterns: Building Resilient Architecture at Scale
Building distributed systems requires mastering proven design patterns that handle the complexities of network failures, data consistency, and system coordination. This complete guide explores essential patterns for creating resilient, scalable backend architectures.
Circuit Breaker Pattern: Preventing Cascade Failures
The Circuit Breaker pattern prevents cascade failures by monitoring service calls and temporarily blocking requests to failing services. This pattern is essential for maintaining system stability when dependencies become unreliable.
interface CircuitBreakerOptions {
failureThreshold: number;
recoveryTimeout: number;
monitoringPeriod: number;
expectedErrors?: (error: Error) => boolean;
}
enum CircuitState {
CLOSED = 'CLOSED', // Normal operation
OPEN = 'OPEN', // Blocking requests
HALF_OPEN = 'HALF_OPEN' // Testing recovery
}
class CircuitBreaker<T> {
private state: CircuitState = CircuitState.CLOSED;
private failureCount = 0;
private nextAttempt = 0;
private successCount = 0;
constructor(
private operation: (...args: any[]) => Promise<T>,
private options: CircuitBreakerOptions
) {}
async execute(...args: any[]): Promise<T> {
if (this.state === CircuitState.OPEN) {
if (Date.now() < this.nextAttempt) {
throw new Error('Circuit breaker is OPEN');
}
this.state = CircuitState.HALF_OPEN;
this.successCount = 0;
}
try {
const result = await this.operation(...args);
this.onSuccess();
return result;
} catch (error) {
this.onFailure(error as Error);
throw error;
}
}
private onSuccess(): void {
this.failureCount = 0;
if (this.state === CircuitState.HALF_OPEN) {
this.successCount++;
if (this.successCount >= 3) { // Require multiple successes
this.state = CircuitState.CLOSED;
}
} else {
this.state = CircuitState.CLOSED;
}
}
private onFailure(error: Error): void {
if (this.options.expectedErrors && this.options.expectedErrors(error)) {
return; // Don't count expected errors
}
this.failureCount++;
if (this.failureCount >= this.options.failureThreshold) {
this.state = CircuitState.OPEN;
this.nextAttempt = Date.now() + this.options.recoveryTimeout;
}
}
getState(): CircuitState {
return this.state;
}
getMetrics() {
return {
state: this.state,
failureCount: this.failureCount,
nextAttempt: this.nextAttempt,
successCount: this.successCount
};
}
}
// Usage example with HTTP client
class ResilientHttpClient {
private circuitBreakers = new Map<string, CircuitBreaker<any>>();
async get(url: string, options: RequestInit = {}): Promise<Response> {
const circuitBreaker = this.getOrCreateCircuitBreaker(url);
return circuitBreaker.execute(async () => {
const response = await fetch(url, {
...options,
timeout: 5000, // 5 second timeout
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
}
return response;
});
}
private getOrCreateCircuitBreaker(url: string): CircuitBreaker<Response> {
const host = new URL(url).host;
if (!this.circuitBreakers.has(host)) {
this.circuitBreakers.set(host, new CircuitBreaker(
(fetchFn) => fetchFn(),
{
failureThreshold: 5,
recoveryTimeout: 30000, // 30 seconds
monitoringPeriod: 60000, // 1 minute
expectedErrors: (error) => {
// Don't break circuit for client errors (4xx)
return error.message.includes('HTTP 4');
}
}
));
}
return this.circuitBreakers.get(host)!;
}
}
💡 Configure different thresholds for different services based on their criticality. Use exponential backoff for recovery attempts and implement proper monitoring and alerting for circuit state changes.
Saga Pattern: Managing Distributed Transactions
The Saga pattern manages distributed transactions by breaking them into a series of local transactions, each with a compensating action. This ensures data consistency across microservices without requiring distributed locks.
interface SagaStep<T = any> {
id: string;
execute: (context: SagaContext) => Promise<T>;
compensate: (context: SagaContext, result?: T) => Promise<void>;
}
interface SagaContext {
sagaId: string;
data: Record<string, any>;
results: Record<string, any>;
}
enum SagaStatus {
PENDING = 'PENDING',
COMPLETED = 'COMPLETED',
FAILED = 'FAILED',
COMPENSATING = 'COMPENSATING',
COMPENSATED = 'COMPENSATED'
}
class SagaOrchestrator {
private sagas = new Map<string, SagaExecution>();
async executeSaga(sagaId: string, steps: SagaStep[], initialData: Record<string, any>): Promise<void> {
const context: SagaContext = {
sagaId,
data: initialData,
results: {}
};
const execution: SagaExecution = {
status: SagaStatus.PENDING,
context,
steps,
completedSteps: [],
currentStep: 0
};
this.sagas.set(sagaId, execution);
try {
await this.executeSteps(execution);
execution.status = SagaStatus.COMPLETED;
} catch (error) {
execution.status = SagaStatus.FAILED;
await this.compensate(execution);
}
}
private async executeSteps(execution: SagaExecution): Promise<void> {
for (let i = execution.currentStep; i < execution.steps.length; i++) {
const step = execution.steps[i];
execution.currentStep = i;
try {
const result = await step.execute(execution.context);
execution.context.results[step.id] = result;
execution.completedSteps.push({ step, result });
// Persist progress for crash recovery
await this.persistSagaState(execution);
} catch (error) {
throw new Error(`Step ${step.id} failed: ${error.message}`);
}
}
}
private async compensate(execution: SagaExecution): Promise<void> {
execution.status = SagaStatus.COMPENSATING;
// Compensate in reverse order
for (let i = execution.completedSteps.length - 1; i >= 0; i--) {
const { step, result } = execution.completedSteps[i];
try {
await step.compensate(execution.context, result);
} catch (error) {
// Log compensation failure but continue
console.error(`Compensation failed for step ${step.id}:`, error);
}
}
execution.status = SagaStatus.COMPENSATED;
await this.persistSagaState(execution);
}
private async persistSagaState(execution: SagaExecution): Promise<void> {
// Persist to database for crash recovery
// Implementation depends on your storage solution
}
}
// Example: Order processing saga
class OrderProcessingSaga {
constructor(
private paymentService: PaymentService,
private inventoryService: InventoryService,
private shippingService: ShippingService,
private sagaOrchestrator: SagaOrchestrator
) {}
async processOrder(orderId: string, orderData: OrderData): Promise<void> {
const steps: SagaStep[] = [
{
id: 'reserve-inventory',
execute: async (context) => {
return await this.inventoryService.reserveItems(orderData.items);
},
compensate: async (context, reservationId) => {
if (reservationId) {
await this.inventoryService.releaseReservation(reservationId);
}
}
},
{
id: 'process-payment',
execute: async (context) => {
return await this.paymentService.charge(
orderData.paymentInfo,
orderData.total
);
},
compensate: async (context, paymentId) => {
if (paymentId) {
await this.paymentService.refund(paymentId);
}
}
},
{
id: 'create-shipment',
execute: async (context) => {
return await this.shippingService.createShipment({
items: orderData.items,
address: orderData.shippingAddress
});
},
compensate: async (context, shipmentId) => {
if (shipmentId) {
await this.shippingService.cancelShipment(shipmentId);
}
}
}
];
await this.sagaOrchestrator.executeSaga(orderId, steps, orderData);
}
}
📝 Design compensating actions to be idempotent and ensure they can handle partial failures. Use event sourcing to track saga progress and implement proper timeout handling for long-running transactions.
CQRS and Event Sourcing: Scalable Data Architecture
Command Query Responsibility Segregation (CQRS) separates read and write operations, while Event Sourcing stores state changes as events. Together, they enable highly scalable and auditable systems.
// Event Sourcing Implementation
interface DomainEvent {
id: string;
aggregateId: string;
aggregateType: string;
eventType: string;
eventData: any;
version: number;
timestamp: Date;
metadata?: Record<string, any>;
}
abstract class AggregateRoot {
protected id: string;
protected version: number = 0;
private uncommittedEvents: DomainEvent[] = [];
constructor(id: string) {
this.id = id;
}
protected addEvent(eventType: string, eventData: any, metadata?: Record<string, any>): void {
const event: DomainEvent = {
id: generateEventId(),
aggregateId: this.id,
aggregateType: this.constructor.name,
eventType,
eventData,
version: this.version + 1,
timestamp: new Date(),
metadata
};
this.uncommittedEvents.push(event);
this.apply(event);
}
abstract apply(event: DomainEvent): void;
getUncommittedEvents(): DomainEvent[] {
return [...this.uncommittedEvents];
}
clearUncommittedEvents(): void {
this.uncommittedEvents = [];
}
static fromHistory<T extends AggregateRoot>(
events: DomainEvent[],
constructor: new (id: string) => T
): T {
if (events.length === 0) {
throw new Error('Cannot create aggregate from empty event history');
}
const aggregate = new constructor(events[0].aggregateId);
for (const event of events) {
aggregate.apply(event);
}
aggregate.clearUncommittedEvents();
return aggregate;
}
}
// Example: User aggregate with event sourcing
class User extends AggregateRoot {
private email: string = '';
private name: string = '';
private isActive: boolean = false;
static create(id: string, email: string, name: string): User {
const user = new User(id);
user.addEvent('UserCreated', { email, name });
return user;
}
changeEmail(newEmail: string): void {
if (!newEmail || newEmail === this.email) {
throw new Error('Invalid email');
}
this.addEvent('EmailChanged', {
oldEmail: this.email,
newEmail
});
}
deactivate(): void {
if (!this.isActive) {
throw new Error('User is already inactive');
}
this.addEvent('UserDeactivated', {});
}
apply(event: DomainEvent): void {
switch (event.eventType) {
case 'UserCreated':
this.email = event.eventData.email;
this.name = event.eventData.name;
this.isActive = true;
break;
case 'EmailChanged':
this.email = event.eventData.newEmail;
break;
case 'UserDeactivated':
this.isActive = false;
break;
}
this.version = event.version;
}
// Getters for read operations
getEmail(): string { return this.email; }
getName(): string { return this.name; }
getIsActive(): boolean { return this.isActive; }
}
// CQRS Command and Query handlers
interface Command {
type: string;
aggregateId: string;
payload: any;
}
interface Query {
type: string;
filters?: Record<string, any>;
pagination?: { offset: number; limit: number };
}
class UserCommandHandler {
constructor(
private eventStore: EventStore,
private eventBus: EventBus
) {}
async handle(command: Command): Promise<void> {
switch (command.type) {
case 'CreateUser':
await this.handleCreateUser(command);
break;
case 'ChangeUserEmail':
await this.handleChangeEmail(command);
break;
}
}
private async handleCreateUser(command: Command): Promise<void> {
const { email, name } = command.payload;
const user = User.create(command.aggregateId, email, name);
await this.eventStore.saveEvents(user.getUncommittedEvents());
// Publish events for read model updates
for (const event of user.getUncommittedEvents()) {
await this.eventBus.publish(event);
}
}
private async handleChangeEmail(command: Command): Promise<void> {
const events = await this.eventStore.getEvents(command.aggregateId);
const user = User.fromHistory(events, User);
user.changeEmail(command.payload.newEmail);
await this.eventStore.saveEvents(user.getUncommittedEvents());
for (const event of user.getUncommittedEvents()) {
await this.eventBus.publish(event);
}
}
}
Tags
Share this article
Enjoying the Content?
If this article helped you, consider buying me a coffee ☕
Your support helps me create more quality content for the community!
☕ Every coffee fuels more tutorials • 🚀 100% goes to creating better content • ❤️ Thank you for your support!
About Amr S.
Passionate about web development and sharing knowledge with the community. Writing about modern web technologies, best practices, and developer experiences.
More from Amr S.

Building Microservices with RabbitMQ: A Complete Guide
Learn how to implement message-driven microservices architecture using RabbitMQ with practical examples and best practices.

Modern TypeScript Development: Advanced Patterns and Best Practices
Explore cutting-edge TypeScript features, design patterns, and architectural approaches that will elevate your development skills and code quality to the next level.

Advanced React Performance Optimization: From Rendering to Memory Management
Dive deep into React performance optimization with concurrent features, advanced memoization strategies, and memory management techniques for building lightning-fast web applications.