Blog/Two-Phase Commit vs Saga: Choosing the Right Transaction Model
distributed-transactionssaga-pattern2pcconsistency

Two-Phase Commit vs Saga: Choosing the Right Transaction Model

June 27, 2024·14 min read·by Bishwambhar Sen
A layout contrasting the coordinate-lock-commit lifecycle of a 2PC with the forward-compensating rollback pipeline of a Saga.

Concept

In monolithic applications, maintaining data consistency is a solved problem. We rely on the ACID properties of a single relational database, where local transactions lock rows and commit atomic updates. In microservices and distributed systems, however, business workflows span multiple database instances. Maintaining consistency across these boundaries is one of the most complex challenges in software architecture.

Architects generally choose between two primary transaction models to solve this: Two-Phase Commit (2PC) and the Saga Pattern. This post compares these two paradigms, mapping directly to Module 11: Distributed Transactions & The Saga Pattern.

Two-Phase Commit (Synchronous Locking):
Client ──> Coordinator ──(1. Prepare & Lock)──> [DB A] & [DB B]
Client ──> Coordinator ──(2. Commit & Unlock)──> [DB A] & [DB B]

Saga Pattern (Asynchronous Compensations):
Client ──> Saga Orchestrator ──(Step 1: Commit)──> [Service A]
                               ──(Step 2: Commit)──> [Service B]
                               ──(Step 3: Fails!)──> [Service C]
          Saga Orchestrator ──(Compensate 2) ──> [Service B (Rollback)]
                            ──(Compensate 1) ──> [Service A (Rollback)]

Two-Phase Commit (2PC): Distributed ACID

Two-Phase Commit is a synchronous, consensus-based protocol that guarantees atomic distributed transactions. A central coordinator manages the transaction lifecycle across multiple participant nodes:

  1. Phase 1 (Prepare): The coordinator sends a Prepare command to all database nodes. Each node executes the local transaction up to the point of committing, acquires the necessary locks on database rows, and writes to its local transaction log. If successful, the node votes "Yes" to commit.
  2. Phase 2 (Commit/Abort): If all nodes vote "Yes", the coordinator broadcasts a Commit command. The nodes apply the changes and release their locks. If any node votes "No" (or times out), the coordinator broadcasts an Abort command, and all nodes roll back their local changes.

The Saga Pattern: Eventual Consistency

The Saga pattern abandons distributed locks in favor of eventual consistency. A Saga is a sequence of independent local transactions. Each local transaction updates the database of a single service and publishes an event or message. Subsequent services consume this event and trigger their own local transactions.

If a local transaction fails (e.g., due to business logic violation like insufficient funds), the Saga must execute Compensating Transactions in reverse order to undo the changes made by preceding local transactions.

Sagas can be structured in two ways:

  • Choreography: Decentralized execution. Each service consumes events, performs its local transaction, and emits new events. There is no central point of control.
  • Orchestration: Centralized execution. A dedicated coordinator service (Saga Orchestrator) manages the state machine, tells participants which local transactions to execute, and coordinates rollbacks if failures occur.

Constraints

The physics of networking and database internals place strict constraints on these transaction models.

1. The Block-and-Wait Bottleneck (2PC)

In 2PC, locks on database rows are held continuously from the start of Phase 1 until Phase 2 completes.

  • Network Partition Constraint: If the coordinator crashes or is partitioned from the network after participants vote "Yes" but before receiving the "Commit" instruction, the participants must wait in limbo. They cannot release their locks or proceed. This creates cascading delays across the entire system, exhaustively consuming database connection pools.

2. The Lack of Isolation (ACID vs. ACD in Saga)

Sagas do not possess the Isolation property of ACID because each local transaction commits its changes immediately. This exposes the system to isolation anomalies:

  • Dirty Reads: Service B reads data committed by Service A's local transaction. Service A's transaction subsequently fails, triggering a compensation that rolls back the value. Service B has now processed state that never officially existed.
  • Lost Updates: Two concurrent Sagas attempt to modify the same record, and the second Saga overwrites the first Saga's changes without taking the initial write into account.

To mitigate isolation constraints, developers must implement architectural countermeasures, such as Semantic Locks (marking a status as Pending or LockedForSaga to block concurrent reads/writes) or Commutative Updates (designing operations that can be applied in any order, such as balance additions/subtractions).

Trade-offs

Choosing between 2PC and Saga involves trading consistency guarantees for availability and low latency.

Architectural Dimension Two-Phase Commit (2PC) Saga Pattern
Consistency Model Immediate / Strong Consistency Eventual Consistency
Concurrency & Latency Low throughput, high latency (due to locking) High throughput, low latency (asynchronous)
System Coupling Tight runtime and temporal coupling Loose spatial and temporal coupling
Rollback Complexity Automatic (handled by DB transaction engine) Manual (must write custom compensating logic)
Scalability Poor (typically bound to single WAN/datacenter) High (natively fits cloud-native microservices)

1. Synchronous Certainty vs. Asynchronous Complexity

  • Two-Phase Commit: Provides a simple programming model. Developers do not need to write compensation code because the database takes care of transaction rollbacks.
    • Trade-off: Slashes availability. According to the CAP theorem, enforcing consistency during network partitions reduces system availability. If one service is down, the entire transaction fails.
  • Saga Pattern: Maximizes availability and scalability by utilizing asynchronous event processing.
    • Trade-off: Increases operational complexity. Writing compensating transactions requires dealing with failure-of-compensations (what if a rollback transaction fails?). This requires building idempotent endpoints and manual reconciliation queues.

Implementation: Orchestration Saga Engine in C#

The following C# code demonstrates a simple, resilient, and unit-testable Saga Orchestrator pattern executing a three-step payment-and-inventory booking workflow. It highlights how the orchestrator catches failures and executes compensating actions in reverse chronological order.

using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Extensions.Logging;

namespace Mpc.Transactions.Saga
{
    public interface ISagaStep
    {
        string Name { get; }
        Task ExecuteAsync(CancellationToken ct);
        Task CompensateAsync(CancellationToken ct);
    }

    public class OrderSagaOrchestrator
    {
        private readonly ILogger<OrderSagaOrchestrator> _logger;

        public OrderSagaOrchestrator(ILogger<OrderSagaOrchestrator> logger)
        {
            _logger = logger ?? throw new ArgumentNullException(nameof(logger));
        }

        public async Task<bool> ExecuteSagaAsync(List<ISagaStep> steps, CancellationToken cancellationToken)
        {
            var executedSteps = new Stack<ISagaStep>();
            _logger.LogInformation("Starting Saga Execution consisting of {Count} steps.", steps.Count);

            foreach (var step in steps)
            {
                cancellationToken.ThrowIfCancellationRequested();

                try
                {
                    _logger.LogInformation("Executing Saga Step: {StepName}", step.Name);
                    await step.ExecuteAsync(cancellationToken);
                    
                    // Track successfully executed steps for potential compensation
                    executedSteps.Push(step);
                }
                catch (Exception ex)
                {
                    _logger.LogError(ex, "Saga Step Failed: {StepName}. Triggering compensating transactions.", step.Name);
                    
                    // Trigger rollback pipeline
                    await RollbackSagaAsync(executedSteps, cancellationToken);
                    return false;
                }
            }

            _logger.LogInformation("Saga execution completed successfully.");
            return true;
        }

        private async Task RollbackSagaAsync(Stack<ISagaStep> executedSteps, CancellationToken cancellationToken)
        {
            _logger.LogWarning("Initiating Saga Rollback. Reversing {Count} steps.", executedSteps.Count);

            while (executedSteps.Count > 0)
            {
                var step = executedSteps.Pop();
                _logger.LogWarning("Compensating step: {StepName}", step.Name);
                
                int retryCount = 3;
                bool success = false;

                while (retryCount > 0 && !success)
                {
                    try
                    {
                        await step.CompensateAsync(cancellationToken);
                        success = true;
                    }
                    catch (Exception ex)
                    {
                        retryCount--;
                        _logger.LogCritical(ex, "Compensation step failed: {StepName}. Retries remaining: {Retries}", 
                            step.Name, retryCount);

                        if (retryCount == 0)
                        {
                            // In production, this must route to an active DLQ (Dead Letter Queue) 
                            // or alerting system for manual operations review.
                            _logger.LogCritical("FATAL: Saga compensation failed for step: {StepName}. Manual intervention required.", step.Name);
                        }
                        else
                        {
                            // Exponential backoff before retry
                            await Task.Delay(1000 * (3 - retryCount), cancellationToken);
                        }
                    }
                }
            }
        }
    }

    // Concrete Step Implementation Example
    public class InventoryReservationStep : ISagaStep
    {
        private readonly string _productId;
        private readonly int _quantity;
        private readonly ILogger _logger;

        public string Name => "InventoryReservation";

        public InventoryReservationStep(string productId, int quantity, ILogger logger)
        {
            _productId = productId;
            _quantity = quantity;
            _logger = logger;
        }

        public async Task ExecuteAsync(CancellationToken ct)
        {
            // Simulate reserving inventory in database
            _logger.LogInformation("Reserving {Quantity} units of {ProductId}.", _quantity, _productId);
            await Task.Delay(50, ct); 
        }

        public async Task CompensateAsync(CancellationToken ct)
        {
            // Simulate releasing reserved inventory
            _logger.LogWarning("Releasing {Quantity} units reservation of {ProductId}.", _quantity, _productId);
            await Task.Delay(50, ct);
        }
    }
}