The evolution of data caching in a high-traffic microservice architecture

5 min read • June 18, 2025

This article was written by Buin-Dylgyrzhap Dylgyrzhapov, Senior Software Engineer for Marketplace Pricing, Rides.

Introduction

Consider a microservice architecture where Service A requires rapid access to a configuration table (~10 MB) stored in a database (DB) directly accessed only by Service B. This configuration is checked by Service A during every order creation process. Since the data is relatively small and infrequently updated, an initial performance optimisation involved implementing a simple caching mechanism within Service A. Each instance of Service A caches the entire configuration table and refreshes it every minute by querying Service B. Given the small table size, the impact on Service A's RAM consumption was negligible.

Problem

As traffic to Service A grew, it required scaling out (deploying more instances). This horizontal scaling directly increased the number of requests to Service B, consequently amplifying the query load on the database. This specific configuration query became the heaviest load on the DB (via Service B) and generated significant network traffic between Service A and Service B.

For example, with 100 instances of Service A refreshing every minute, roughly 1 GB of configuration data was transferred across the network (100 instances * 10 MB/instance). Furthermore, each instance of Service A and Server B consumed CPU resources for serialising and deserialising this data during cache updates. These combined factors — high DB load, excessive network traffic, and increased CPU usage — necessitated a better approach.

Improving the caching strategy

Phase 1: hashing implementation

Recognising that the data updates infrequently but need to be disseminated promptly, we implemented content hashing. Service B calculates a hash (we use the MD5 hashing algorithm) of the current configuration data snapshot it retrieves from the DB. Service B saves both the data snapshot and its hash, updating this information regularly. The update period should be equal to or shorter than Service A's cache refresh interval. This ensures that when Service A requests new data from Service B, it doesn’t receive outdated information in the next update.

Service A instances now include their currently cached data's hash in their refresh requests to Service B. If the hash sent by Service A matches the current hash held by Service B, Service B responds with a simple "no change" status, avoiding data transfer. Only if the hashes differ, or if Service A has no hash (initial request), Service B sends the full dataset. This drastically reduces network traffic and the associated CPU overhead for serialisation/deserialisation on both services. Caching the snapshot in Service B also improves its memory and CPU efficiency, as it reuses the same in-memory data stored in a serialised format for multiple Service A requests. However, the DB load now scales linearly with the number of Service B instances.

This hash-based approach still has limitations. Firstly, scaling out Service B still increases the DB load. However, with the new approach, the load on Service B is quite small, and there is no need to have many instances of B. Secondly, if the data did update frequently (e.g., every minute), the benefits diminish as full data transfers would occur often.

Phase 2: decoupling DB load from a number of instances of Service B with Leader Election and Redis

To address the DB scaling issue, we introduced leader election among Service B instances using etcd as a leader election mechanism. Only the elected leader queries the DB at the refresh interval (e.g., once per minute). 

The leader stores the latest data snapshot and its hash in a shared, high-performance distributed cache like Redis (chosen for high throughput and low latency). All Service B instances (including the leader) now rely on Redis as the source for this data. To optimise, each Service B instance maintains its own internal cache. Periodically, an instance checks the latest hash in Redis against the hash of its cached data. If the hashes match, the internal cache is up-to-date. If they differ, the instance retrieves the full data snapshot from Redis (not the DB) to update its internal cache.

Processing requests from Service A stays the same as in the previous step. This architecture ensures the DB is queried only once per refresh interval, regardless of the number of Service B instances. Redis becomes the source of truth between DB updates.

Practical implementation and results

Our team successfully transitioned from the initial Service A caching to the Redis and etcd-based solution (Phase 2). We observed significant benefits.

Max event loop lag 

Reduced by ~3x in both Service A (orange) and Service B (purple), largely due to decreased serialisation/deserialisation overhead.

Average event loop lag

Reduced by 6x in Service B (yellow) and didn’t change in Service A (green) because each process refreshes the cache once per minute, there is little chance that the average will be affected.

Database load

Drastically reduced, as queries became independent of the number of Service A or Service B instances.

Before:

After:

CPU load

Noticeably decreased in Service B instances. The leader instance showed slightly higher CPU usage due to its responsibilities (DB queries, Redis writes), which was expected. No significant change was observed in Service A's CPU load, likely because caching overhead was already minor compared to its primary business logic.

Sum CPU load over all instances

If we sum CPU usage graphs, the difference becomes even more representative. As a result, we can downscale this Service from the current 12 instances to a minimal configuration with only 6 instances, which will save us some money on infrastructure.

Network and latency

Significant improvements in average request processing duration (lower latency) and a marked decrease in bytes sent per second between services.

We observed periodic spikes in traffic corresponding to times when the configuration data actually changed, triggering full data transfers from Service B to Service A instances upon hash mismatch. 

Previously, daily load fluctuations on Service A caused it to scale up and down. In the original architecture, scaling up Service A directly increased DB load, exacerbating performance issues during peak times. The new architecture decoupled this, mitigating the performance impact of Service A scaling related to configuration fetching.

RAM usage

Remained relatively stable, as anticipated.

Next steps

To address the issue of frequent data updates (close to the cache refresh interval) that result in repeated full data transfers, even when hashing is applied, and to mitigate the potential network traffic spikes illustrated in the Network and Latency section, we can utilise differential updates (also known as diff patching).

Using the library jsondiffpatch, we can calculate the precise changes (the "diff") between two versions of the JSON data. To enable this, the leader Service B instance would need to store not just the latest snapshot in Redis, but also a few recent historical snapshots. This would significantly reduce network traffic during updates and lower peak RAM usage in Service A (as it only needs to store the current snapshot and apply the patch, rather than holding old and new full snapshots simultaneously during processing).

We can optionally transition to a dedicated leader process deployment, ensuring the CPU load remains consistent across all deployments. This may save us from unnecessary scaling up of Service B instances.

Conclusion

The evolution from instance-specific caches to a centralised Redis store managed via etcd leader election demonstrates a robust pattern for optimising shared data access in high-traffic microservices.

Addressing the bottlenecks of the initial approach required tackling both network traffic (via hashing) and database contention (via leader election and Redis). The practical outcomes, including significantly reduced database load regardless of service instance count, lower network bandwidth usage, decreased CPU usage for serialisation, and enhanced latency, confirm this architectural advancement.

While diff patching offers a theoretical next step for minimising update payloads, the current Redis-based solution provides a scalable and performant mechanism for handling frequently accessed, semi-static configuration data.

Join us!

If you’re driven by purpose and excited by big challenges, you’ll feel right at home at Bolt

Ready to thrive in a fast-paced, ever-evolving industry? Explore our open roles and come build something meaningful with us!