by Chandrasekhar Panda and Henning Soller
More microservices are better. More releases are better. Active-active is the ultimate solution to resiliency for every application.
This is the prevailing wisdom in the world of modern digital technology infrastructure. While this approach has led to success for many companies—pushing them to further digitalize, automate, and modularize their environment—these principles are not universal truths. Indeed, some companies have seen their performance decline because of adherence to them. For instance, extensively adopting microservices has, in some cases, resulted in companies creating highly distributed architectures that are difficult to scale. Some companies have struggled to manage intricate interdependencies and navigate the complexities introduced by interfaces between different applications. Similarly, some organizations, in pursuit of more-frequent release cycles, have inadvertently diminished the emphasis on rigorous testing, thereby introducing additional complexities and instabilities in the production environment.
The truth is that there’s no one-size-fits-all approach. Instead, companies can align the pursuit of innovation with the need to maintain robust and dependable technological infrastructures. This post explores how companies could bring a nuanced approach to application design, release frequency, and infrastructure configuration.
Microservices: Matching the tool to the task
Microservices have revolutionized software development by promoting agility and independent scaling, but they’re not always the best choice. When deciding if microservices are the right solution for a capability, companies can consider a few factors:
Reusability. Does the capability need to be readily integrated into other projects? If so, microservices can act as modular building blocks, facilitating independent updates and easy integration. However, for internal functionalities, a well-architected monolith may be more efficient and easier to manage.
Strategic importance. Is the capability the bedrock of your system, propelling core business functions? Prioritize stability and robustness here. A well-structured monolith might be more dependable than a fragmented microservices landscape, especially if individual scaling isn’t paramount.
Scaling needs. Does the capability need to adapt to rapidly changing demands? Microservices allow for granular scaling of individual components, which is ideal for unpredictable workloads. For predictable growth, however, a well-optimized monolith can efficiently handle scaling, thus minimizing overhead.
Consolidation. Companies will want to resist the “best of breed” trap of integrating multiple disparate systems, which can create a labyrinth of interfaces and scaling challenges. Consider consolidating core systems into large, cohesive building blocks. These act as pillars of the digital fortress, simplifying management and reducing vendor complexity.
Beyond strategically considering these factors, companies should keep a couple of guardrails for microservices in mind. Based on McKinsey analysis, having more than 500 to 1,000 microservices leads to increased complexity. Too many microservices can thwart scalability and often cause companies to miss opportunities for reuse, given that no one fully understands the landscape of interfaces. Similarly, the development of a microservice should typically not cost more than $10,000; higher costs can indicate the need for further consolidation.
Finding the right release cadence
The mantra “release early, release often” has enthralled the software world, promising agility and innovation. However, chasing release frequency without investing in automation can create challenges. These include teams that are overburdened (and potentially less thorough) from so many testing cycles, challenges from manual deployment, security slip-ups, production that’s fraught with bugs and regressions, and users who are overwhelmed with updates.
Organizations that have successfully adopted daily releases have done so by coupling this schedule with robust automation, carefully choreographing every step from testing to deployment. Doing so allows for rapid innovation, with new features reaching users faster, creating a competitive advantage; reduced risk due to smaller code changes per release; and an improved feedback loop, resulting in products that are more refined.
Specifically, companies have invested in the following:
Automated testing. From unit to integration and security tests, automation ensures comprehensive coverage without straining human resources.
Deployment orchestration. Automated deployment pipelines streamline release processes, minimizing manual intervention and errors.
Continuous monitoring. Constant vigilance through automated monitoring tools helps identify and address issues quickly, minimizing their impact on production.
Despite the benefits of rapid releases paired with automation, frequency isn’t everything. Releasing new functionalities multiple times a day may not offer significant additional value, potentially confusing users and disrupting workflows. For example, one financial-services company issued several new releases per day, which led to decreased efficiency in the operations department due to frequent and improperly communicated changes to front-end and back-end functionality. Similarly, an insurance company suffered a major outage due to an improperly tested release.
Instead of prioritizing rapid releases, the focus should be on finding the optimal release cadence that balances speed with stability and user experience.
Rethinking resilience
The siren song of continuous uptime has lured many organizations, particularly those in the telecommunications and banking sectors, to adopt self-controlled active-active setups across the board. Doing so can be costly because organizations must manage and set up the infrastructure themselves, potentially outweighing any risk reduction for noncritical systems.
The significant cost of such setups is typically driven by the need to redesign the applications and the fact that organizations have limited scale compared with cloud providers. Organizations may also struggle to conduct adequate risk assessments of the individual applications. Organizations should only need more-resilient setups for the systems whose downtime would cripple operations. By distributing the workload across several zones and possibly even regions, cloud-based setups have proved to be more resilient without requiring organizations to massively redesign applications.
Of course, cloud-based setups do not come without challenges. Organizations often have to redesign critical and legacy applications to be cloud-ready. But by deploying setups across three availability zones per region offered by cloud providers, organizations can achieve resilience without having to build massive additional infrastructure and while still relying on the resilience of the setups in that region. Typically, in our experience, setups must span regions to achieve resilience comparable to active-active-passive setups. These setups across regions, however, require organizations to set up networks and manage response times carefully to ensure customers are only minimally affected by outages.
To find the right solution, organizations can analyze the cost–benefit equation for each application and avoid overspending on active-active when simpler and more-resilient solutions are available.
Assumptions about application design, release frequency, and infrastructure configuration are proving to be incongruent with practical realities. It’s time to recalibrate. By tailoring their approach to each of the above themes, organizations can discover a more resilient and digitally adept operational framework that aligns technological choices with the genuine needs and challenges of the contemporary business landscape.
Chandrasekhar Panda is a partner in McKinsey’s Riyadh office, and Henning Soller is a partner in the Frankfurt office.