Backend

Distributed Systems Design Patterns: Building Resilient Architecture at Scale

Explore proven distributed systems design patterns and learn how to implement Circuit Breaker, Saga, CQRS, and Event Sourcing patterns for building resilient, scalable backend systems.

A

Amr S.

Author & Developer

25 min read
August 15, 2025
999+ views
Distributed Systems Design Patterns: Building Resilient Architecture at Scale

Distributed Systems Design Patterns: Building Resilient Architecture at Scale

Building distributed systems requires mastering proven design patterns that handle the complexities of network failures, data consistency, and system coordination. This complete guide explores essential patterns for creating resilient, scalable backend architectures.

Circuit Breaker Pattern: Preventing Cascade Failures

The Circuit Breaker pattern prevents cascade failures by monitoring service calls and temporarily blocking requests to failing services. This pattern is essential for maintaining system stability when dependencies become unreliable.

typescript
interface CircuitBreakerOptions {
  failureThreshold: number;
  recoveryTimeout: number;
  monitoringPeriod: number;
  expectedErrors?: (error: Error) => boolean;
}

enum CircuitState {
  CLOSED = 'CLOSED',     // Normal operation
  OPEN = 'OPEN',         // Blocking requests
  HALF_OPEN = 'HALF_OPEN' // Testing recovery
}

class CircuitBreaker<T> {
  private state: CircuitState = CircuitState.CLOSED;
  private failureCount = 0;
  private nextAttempt = 0;
  private successCount = 0;
  
  constructor(
    private operation: (...args: any[]) => Promise<T>,
    private options: CircuitBreakerOptions
  ) {}

  async execute(...args: any[]): Promise<T> {
    if (this.state === CircuitState.OPEN) {
      if (Date.now() < this.nextAttempt) {
        throw new Error('Circuit breaker is OPEN');
      }
      this.state = CircuitState.HALF_OPEN;
      this.successCount = 0;
    }

    try {
      const result = await this.operation(...args);
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure(error as Error);
      throw error;
    }
  }

  private onSuccess(): void {
    this.failureCount = 0;
    
    if (this.state === CircuitState.HALF_OPEN) {
      this.successCount++;
      if (this.successCount >= 3) { // Require multiple successes
        this.state = CircuitState.CLOSED;
      }
    } else {
      this.state = CircuitState.CLOSED;
    }
  }

  private onFailure(error: Error): void {
    if (this.options.expectedErrors && this.options.expectedErrors(error)) {
      return; // Don't count expected errors
    }

    this.failureCount++;
    
    if (this.failureCount >= this.options.failureThreshold) {
      this.state = CircuitState.OPEN;
      this.nextAttempt = Date.now() + this.options.recoveryTimeout;
    }
  }

  getState(): CircuitState {
    return this.state;
  }

  getMetrics() {
    return {
      state: this.state,
      failureCount: this.failureCount,
      nextAttempt: this.nextAttempt,
      successCount: this.successCount
    };
  }
}

// Usage example with HTTP client
class ResilientHttpClient {
  private circuitBreakers = new Map<string, CircuitBreaker<any>>();

  async get(url: string, options: RequestInit = {}): Promise<Response> {
    const circuitBreaker = this.getOrCreateCircuitBreaker(url);
    
    return circuitBreaker.execute(async () => {
      const response = await fetch(url, {
        ...options,
        timeout: 5000, // 5 second timeout
      });
      
      if (!response.ok) {
        throw new Error(`HTTP ${response.status}: ${response.statusText}`);
      }
      
      return response;
    });
  }

  private getOrCreateCircuitBreaker(url: string): CircuitBreaker<Response> {
    const host = new URL(url).host;
    
    if (!this.circuitBreakers.has(host)) {
      this.circuitBreakers.set(host, new CircuitBreaker(
        (fetchFn) => fetchFn(),
        {
          failureThreshold: 5,
          recoveryTimeout: 30000, // 30 seconds
          monitoringPeriod: 60000, // 1 minute
          expectedErrors: (error) => {
            // Don't break circuit for client errors (4xx)
            return error.message.includes('HTTP 4');
          }
        }
      ));
    }
    
    return this.circuitBreakers.get(host)!;
  }
}

💡 Configure different thresholds for different services based on their criticality. Use exponential backoff for recovery attempts and implement proper monitoring and alerting for circuit state changes.

Saga Pattern: Managing Distributed Transactions

The Saga pattern manages distributed transactions by breaking them into a series of local transactions, each with a compensating action. This ensures data consistency across microservices without requiring distributed locks.

typescript
interface SagaStep<T = any> {
  id: string;
  execute: (context: SagaContext) => Promise<T>;
  compensate: (context: SagaContext, result?: T) => Promise<void>;
}

interface SagaContext {
  sagaId: string;
  data: Record<string, any>;
  results: Record<string, any>;
}

enum SagaStatus {
  PENDING = 'PENDING',
  COMPLETED = 'COMPLETED',
  FAILED = 'FAILED',
  COMPENSATING = 'COMPENSATING',
  COMPENSATED = 'COMPENSATED'
}

class SagaOrchestrator {
  private sagas = new Map<string, SagaExecution>();

  async executeSaga(sagaId: string, steps: SagaStep[], initialData: Record<string, any>): Promise<void> {
    const context: SagaContext = {
      sagaId,
      data: initialData,
      results: {}
    };

    const execution: SagaExecution = {
      status: SagaStatus.PENDING,
      context,
      steps,
      completedSteps: [],
      currentStep: 0
    };

    this.sagas.set(sagaId, execution);

    try {
      await this.executeSteps(execution);
      execution.status = SagaStatus.COMPLETED;
    } catch (error) {
      execution.status = SagaStatus.FAILED;
      await this.compensate(execution);
    }
  }

  private async executeSteps(execution: SagaExecution): Promise<void> {
    for (let i = execution.currentStep; i < execution.steps.length; i++) {
      const step = execution.steps[i];
      execution.currentStep = i;

      try {
        const result = await step.execute(execution.context);
        execution.context.results[step.id] = result;
        execution.completedSteps.push({ step, result });
        
        // Persist progress for crash recovery
        await this.persistSagaState(execution);
      } catch (error) {
        throw new Error(`Step ${step.id} failed: ${error.message}`);
      }
    }
  }

  private async compensate(execution: SagaExecution): Promise<void> {
    execution.status = SagaStatus.COMPENSATING;
    
    // Compensate in reverse order
    for (let i = execution.completedSteps.length - 1; i >= 0; i--) {
      const { step, result } = execution.completedSteps[i];
      
      try {
        await step.compensate(execution.context, result);
      } catch (error) {
        // Log compensation failure but continue
        console.error(`Compensation failed for step ${step.id}:`, error);
      }
    }
    
    execution.status = SagaStatus.COMPENSATED;
    await this.persistSagaState(execution);
  }

  private async persistSagaState(execution: SagaExecution): Promise<void> {
    // Persist to database for crash recovery
    // Implementation depends on your storage solution
  }
}

// Example: Order processing saga
class OrderProcessingSaga {
  constructor(
    private paymentService: PaymentService,
    private inventoryService: InventoryService,
    private shippingService: ShippingService,
    private sagaOrchestrator: SagaOrchestrator
  ) {}

  async processOrder(orderId: string, orderData: OrderData): Promise<void> {
    const steps: SagaStep[] = [
      {
        id: 'reserve-inventory',
        execute: async (context) => {
          return await this.inventoryService.reserveItems(orderData.items);
        },
        compensate: async (context, reservationId) => {
          if (reservationId) {
            await this.inventoryService.releaseReservation(reservationId);
          }
        }
      },
      {
        id: 'process-payment',
        execute: async (context) => {
          return await this.paymentService.charge(
            orderData.paymentInfo,
            orderData.total
          );
        },
        compensate: async (context, paymentId) => {
          if (paymentId) {
            await this.paymentService.refund(paymentId);
          }
        }
      },
      {
        id: 'create-shipment',
        execute: async (context) => {
          return await this.shippingService.createShipment({
            items: orderData.items,
            address: orderData.shippingAddress
          });
        },
        compensate: async (context, shipmentId) => {
          if (shipmentId) {
            await this.shippingService.cancelShipment(shipmentId);
          }
        }
      }
    ];

    await this.sagaOrchestrator.executeSaga(orderId, steps, orderData);
  }
}

📝 Design compensating actions to be idempotent and ensure they can handle partial failures. Use event sourcing to track saga progress and implement proper timeout handling for long-running transactions.

CQRS and Event Sourcing: Scalable Data Architecture

Command Query Responsibility Segregation (CQRS) separates read and write operations, while Event Sourcing stores state changes as events. Together, they enable highly scalable and auditable systems.

typescript
// Event Sourcing Implementation
interface DomainEvent {
  id: string;
  aggregateId: string;
  aggregateType: string;
  eventType: string;
  eventData: any;
  version: number;
  timestamp: Date;
  metadata?: Record<string, any>;
}

abstract class AggregateRoot {
  protected id: string;
  protected version: number = 0;
  private uncommittedEvents: DomainEvent[] = [];

  constructor(id: string) {
    this.id = id;
  }

  protected addEvent(eventType: string, eventData: any, metadata?: Record<string, any>): void {
    const event: DomainEvent = {
      id: generateEventId(),
      aggregateId: this.id,
      aggregateType: this.constructor.name,
      eventType,
      eventData,
      version: this.version + 1,
      timestamp: new Date(),
      metadata
    };

    this.uncommittedEvents.push(event);
    this.apply(event);
  }

  abstract apply(event: DomainEvent): void;

  getUncommittedEvents(): DomainEvent[] {
    return [...this.uncommittedEvents];
  }

  clearUncommittedEvents(): void {
    this.uncommittedEvents = [];
  }

  static fromHistory<T extends AggregateRoot>(
    events: DomainEvent[],
    constructor: new (id: string) => T
  ): T {
    if (events.length === 0) {
      throw new Error('Cannot create aggregate from empty event history');
    }

    const aggregate = new constructor(events[0].aggregateId);
    
    for (const event of events) {
      aggregate.apply(event);
    }

    aggregate.clearUncommittedEvents();
    return aggregate;
  }
}

// Example: User aggregate with event sourcing
class User extends AggregateRoot {
  private email: string = '';
  private name: string = '';
  private isActive: boolean = false;

  static create(id: string, email: string, name: string): User {
    const user = new User(id);
    user.addEvent('UserCreated', { email, name });
    return user;
  }

  changeEmail(newEmail: string): void {
    if (!newEmail || newEmail === this.email) {
      throw new Error('Invalid email');
    }

    this.addEvent('EmailChanged', { 
      oldEmail: this.email, 
      newEmail 
    });
  }

  deactivate(): void {
    if (!this.isActive) {
      throw new Error('User is already inactive');
    }

    this.addEvent('UserDeactivated', {});
  }

  apply(event: DomainEvent): void {
    switch (event.eventType) {
      case 'UserCreated':
        this.email = event.eventData.email;
        this.name = event.eventData.name;
        this.isActive = true;
        break;
      
      case 'EmailChanged':
        this.email = event.eventData.newEmail;
        break;
      
      case 'UserDeactivated':
        this.isActive = false;
        break;
    }

    this.version = event.version;
  }

  // Getters for read operations
  getEmail(): string { return this.email; }
  getName(): string { return this.name; }
  getIsActive(): boolean { return this.isActive; }
}

// CQRS Command and Query handlers
interface Command {
  type: string;
  aggregateId: string;
  payload: any;
}

interface Query {
  type: string;
  filters?: Record<string, any>;
  pagination?: { offset: number; limit: number };
}

class UserCommandHandler {
  constructor(
    private eventStore: EventStore,
    private eventBus: EventBus
  ) {}

  async handle(command: Command): Promise<void> {
    switch (command.type) {
      case 'CreateUser':
        await this.handleCreateUser(command);
        break;
      case 'ChangeUserEmail':
        await this.handleChangeEmail(command);
        break;
    }
  }

  private async handleCreateUser(command: Command): Promise<void> {
    const { email, name } = command.payload;
    const user = User.create(command.aggregateId, email, name);
    
    await this.eventStore.saveEvents(user.getUncommittedEvents());
    
    // Publish events for read model updates
    for (const event of user.getUncommittedEvents()) {
      await this.eventBus.publish(event);
    }
  }

  private async handleChangeEmail(command: Command): Promise<void> {
    const events = await this.eventStore.getEvents(command.aggregateId);
    const user = User.fromHistory(events, User);
    
    user.changeEmail(command.payload.newEmail);
    
    await this.eventStore.saveEvents(user.getUncommittedEvents());
    
    for (const event of user.getUncommittedEvents()) {
      await this.eventBus.publish(event);
    }
  }
}

Tags

#Backend#Architecture#Microservices

Share this article

Enjoying the Content?

If this article helped you, consider buying me a coffee
Your support helps me create more quality content for the community!

Buy Me a Coffee
Or simply share this article!

☕ Every coffee fuels more tutorials • 🚀 100% goes to creating better content • ❤️ Thank you for your support!

A

About Amr S.

Passionate about web development and sharing knowledge with the community. Writing about modern web technologies, best practices, and developer experiences.

TechVision