Preventing Cache Stampedes with Singleflight and Distributed Mutexes

May 16, 2024·12 min read·by Bishwambhar Sen

A sequence diagram showing multiple concurrent requests being coalesced into a single database read with subsequent cache update.

Concept

In high-throughput distributed systems, caching is the primary shield protecting databases and downstream microservices from traffic surges. However, when caching strategy fails, it can manifest as a cache stampede (also known as the thundering herd or dog-piling).

A cache stampede occurs when a highly popular ("hot") cache key expires under heavy concurrent load.

Normal Operation:
[Clients] ---> [Cache (Hit)] ---> Return Data

Stampede Condition (Cache Key Expires):
[Client 1] ---\
[Client 2] ----+--> [Cache (Miss)] ---> [Database Query 1] \  Database is overwhelmed,
[Client 3] ----/                        [Database Query 2] -+-> connection pools exhaust,
[Client N] ---------------------------> [Database Query N] /  and latency spikes.

Because the database queries are slow, subsequent client requests also observe a cache miss and trigger their own database fetches. The resulting spike in database CPU usage, thread exhaustion, and memory consumption can trigger a cascading failure across the entire application stack.

To mitigate this, three primary architectural patterns are deployed:

1. Singleflight (Request Coalescing)

Singleflight is an in-memory execution deduplicator. When a cache miss occurs, the application does not immediately call the database. Instead, it checks a local directory of active "in-flight" requests. If a request for the same key is already in progress, the calling thread blocks and waits for the existing request to complete, sharing its result once it arrives. This coalesces $N$ concurrent calls into a single database query.

2. Distributed Locks (Mutexes)

In a multi-node cluster, Singleflight only coalesces requests on a per-node basis. If 10 instances of a service experience a cache miss simultaneously, 10 database queries will still be executed. A distributed lock (e.g., Redlock using Redis) enforces a cluster-wide mutual exclusion. Only the node that acquires the lock is permitted to query the database and update the cache; other nodes block and poll or wait for the cache key to be populated.

3. Probabilistic Early Expiration (XFetch)

Pioneered in the paper "Optimal Probabilistic Cache Refreshment," the XFetch algorithm uses a probability distribution to decide whether to asynchronously refresh a cache key before it officially expires. The decision is calculated as: $$-\beta \cdot \delta \cdot \ln(rand()) > \text{TTL} - t$$

Where:

$\beta$ is a configuration parameter ($>0$) to tune aggressiveness.
$\delta$ is the time taken to compute the value from the database.
$rand()$ is a random number between 0 and 1.
$\text{TTL}$ is the total lifespan of the cache item.
$t$ is the elapsed time since creation.

As the remaining lifespan approaches zero, the probability of a background thread triggering a refresh increases, ensuring hot items are refreshed asynchronously and never hit an hard expiration path.

Constraints

When designing a cache stampede protection layer, architects must weigh several low-level runtime constraints:

Synchronization Overhead

Singleflight relies on fine-grained concurrency primitives (SemaphoreSlim, ConcurrentDictionary, or locks). In high-frequency pathways, lock contention on the dictionary containing the active tasks can introduce latency. Memory allocations for task synchronization objects (TaskCompletionSource) must be minimized to avoid Garbage Collection (GC) pressure.

Distributed Lock Latency

Distributed locking introduces network round-trips to the lock manager (e.g., Redis, Consul). The time taken to acquire and release the lock ($L_{lock}$) must be significantly less than the database query execution time ($T_{db}$). If $L_{lock} \approx T_{db}$, the lock overhead negates the performance benefits of coalescing.

Error Propagation and Blast Radius

In a request-coalescing setup, if the database query fails or times out, the failure is propagated to all waiting client requests. A robust implementation must support stale-while-revalidate semantics, allowing the singleflight thread to temporarily serve expired cache data if the database query fails.

Lock TTL and Database Execution Bounds

If using distributed locks, you must assign a Time-To-Live (TTL) to the lock to prevent deadlocks if the node holding the lock crashes. However, if the database query takes longer than the lock TTL, the lock will be released prematurely, allowing another node to initiate a query and causing a stampede anyway. The lock holder must run a background "lock renewal" heartbeat or enforce a strict database query timeout.

Trade-offs

Choosing the appropriate mitigation pattern requires analyzing system scale, architecture complexity, and consistency SLA:

Pattern	Local Latency Overhead	Network/Storage Overhead	Code Complexity	Scope of Protection
Naive Caching	Negligible	None	Extremely Low	None
Singleflight	Low (dictionary lookup + Task await)	None	Medium	Single Node only (Cluster may still duplicate fetches)
Distributed Lock	High (Network round-trip to Redis)	Moderate (Redis traffic)	High (Requires fallback and heartbeat)	Cluster-wide (Strict single fetch across nodes)
XFetch	Low	Low	Medium	Hybrid (Asynchronous background refresh)

graph TD
    A[Cache Miss] --> B{Scale of System?}
    B -- "Single Instance" --> C[Apply Singleflight / Local Lock]
    B -- "Distributed Cluster" --> D{DB Load Sensitivity?}
    D -- "Extremely High (Strict SLA)" --> E[Distributed Mutex / Lock]
    D -- "Moderate (Can tolerate N-queries)" --> F[Singleflight + Cache Revalidation]
    D -- "Read Latency Critical" --> G[Probabilistic Early Expiration - XFetch]

Code

The following is a production-ready C# implementation of the Singleflight pattern. It uses ConcurrentDictionary and SemaphoreSlim to coalesce concurrent async calls for the same key, ensuring only one fetch is executed. It also features a timeout safety mechanism to prevent stuck tasks from blocking callers indefinitely.

using System;
using System.Collections.Concurrent;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Logging;

namespace CacheResiliency
{
    public interface ICoalescedCache
    {
        Task<T> GetOrAddAsync<T>(string key, Func<CancellationToken, Task<T>> factory, TimeSpan ttl, CancellationToken cancellationToken);
    }

    public class SingleflightCache : ICoalescedCache
    {
        private readonly IDistributedCache _cache;
        private readonly ILogger<SingleflightCache> _logger;
        
        // Maps in-flight keys to their active computation tasks
        private readonly ConcurrentDictionary<string, Task<object?>> _inFlightCalls = new(StringComparer.Ordinal);
        
        // Serializes access to individual keys to prevent race conditions during dictionary mutation
        private readonly ConcurrentDictionary<string, SemaphoreSlim> _keyLocks = new(StringComparer.Ordinal);

        public SingleflightCache(IDistributedCache cache, ILogger<SingleflightCache> logger)
        {
            _cache = cache ?? throw new ArgumentNullException(nameof(cache));
            _logger = logger ?? throw new ArgumentNullException(nameof(logger));
        }

        public async Task<T> GetOrAddAsync<T>(
            string key, 
            Func<CancellationToken, Task<T>> factory, 
            TimeSpan ttl, 
            CancellationToken cancellationToken)
        {
            // 1. Attempt fast path: Read from Cache
            var cachedValue = await TryGetFromCacheAsync<T>(key, cancellationToken);
            if (cachedValue != null)
            {
                return cachedValue;
            }

            // 2. Cache Miss: Setup Singleflight Coalescing
            var keyLock = _keyLocks.GetOrAdd(key, _ => new SemaphoreSlim(1, 1));
            await keyLock.WaitAsync(cancellationToken);

            Task<object?> fetchTask;
            bool isOriginator = false;

            try
            {
                // Double-check cache inside lock boundary
                cachedValue = await TryGetFromCacheAsync<T>(key, cancellationToken);
                if (cachedValue != null)
                {
                    return cachedValue;
                }

                // Retrieve or register the computation task in the global directory
                fetchTask = _inFlightCalls.GetOrAdd(key, k =>
                {
                    isOriginator = true;
                    return ExecuteFetchAndCacheAsync(k, factory, ttl, cancellationToken);
                });
            }
            finally
            {
                keyLock.Release();
            }

            if (isOriginator)
            {
                _logger.LogInformation("Thread {ThreadId} is the originator for key '{Key}'. Querying database...", Thread.CurrentThread.ManagedThreadId, key);
            }
            else
            {
                _logger.LogInformation("Thread {ThreadId} coalesced onto active fetch for key '{Key}'. Awaiting result...", Thread.CurrentThread.ManagedThreadId, key);
            }

            try
            {
                // Await the shared computation task
                var result = await fetchTask.WaitAsync(cancellationToken);
                return (T)result!;
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, "Failed executing database fetch for key '{Key}' during coalesced operation.", key);
                throw;
            }
        }

        private async Task<object?> ExecuteFetchAndCacheAsync<T>(
            string key, 
            Func<CancellationToken, Task<T>> factory, 
            TimeSpan ttl, 
            CancellationToken cancellationToken)
        {
            try
            {
                // Enforce a strict timeout on the database fetch operation
                using var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
                cts.CancelAfter(TimeSpan.FromSeconds(30)); // 30s timeout guard

                T freshData = await factory(cts.Token);

                // Write back to the cache
                await WriteToCacheAsync(key, freshData, ttl, cancellationToken);
                
                return freshData;
            }
            finally
            {
                // Clean up the in-flight registration so future requests can query again
                _inFlightCalls.TryRemove(key, out _);
                
                // Cleanup key lock semaphore if no thread is waiting on it
                if (_keyLocks.TryRemove(key, out var sem))
                {
                    sem.Dispose();
                }
            }
        }

        private async Task<T?> TryGetFromCacheAsync<T>(string key, CancellationToken token)
        {
            try
            {
                var bytes = await _cache.GetAsync(key, token);
                if (bytes == null || bytes.Length == 0) return default;
                
                // Perform fast deserialization (e.g., System.Text.Json)
                return System.Text.Json.JsonSerializer.Deserialize<T>(bytes);
            }
            catch (Exception ex)
            {
                _logger.LogWarning(ex, "Failed reading cache for key '{Key}'. Defaulting to miss.", key);
                return default;
            }
        }

        private async Task WriteToCacheAsync<T>(string key, T value, TimeSpan ttl, CancellationToken token)
        {
            try
            {
                var bytes = System.Text.Json.JsonSerializer.SerializeToUtf8Bytes(value);
                var options = new DistributedCacheEntryOptions { AbsoluteExpirationRelativeToNow = ttl };
                await _cache.SetAsync(key, bytes, options, token);
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, "Failed writing fresh data to cache for key '{Key}'", key);
            }
        }
    }
}

← Back to all articles