Bringing data platforms to cloud

By Srikanth Dola, Aziz Shaikh, Henning Soller, and Lisa Weiß

Executives broadly recognize the importance of data to their business and operations. That doesn’t mean most companies have made the necessary investments to unlock their data’s full value for their development teams. Common obstacles are legacy systems, distributed data sources, and the lack of an overarching data strategy.

A holistic data transformation that comprises strategy, a data operating model, and modern technology can enable companies to increase revenue and productivity while trimming technology costs. In our experience, companies can achieve EBITDA growth of 7 to 15 percent by unlocking novel data-driven business models, boosting the efficiency of production processes by using AI to increase uptime, and harnessing higher-quality data to improve business processes.

Shifting to an operating model that treats data as a product can make data easily findable and accessible for development teams and reduce the total time to execute use cases by three to six months. Organizations that establish clear ownership of data can vastly improve quality and increase the efficiency of technical and business teams. This effort can also cut IT spending by 10 to 20 percent by alleviating duplication efforts for data development teams and consolidating IT systems and licenses.

Significant benefits can be achieved with the right operating model and the right team structure—but these shouldn’t be excuses for not investing in the underlying technology backbone. Often, the improvements from moving to an appropriate technology also act as catalysts for better access to talent and the relevant automations to enable the right future operating model. The movement to cloud is a core part of this technology enablement.

Cloud as a data enabler and accelerator

Cloud is a significant accelerator of the data transformation as well as an essential enabler for disruptive business opportunities.

Cloud offers organizations across industries built-in, seamlessly integrated platform services, such as centralized monitoring and logging functionalities, scheduling, and orchestration. Furthermore, compute and storage capacity can be easily tailored to specific industries and enterprises, thanks to a variety of offerings, from optimized compute power to storage nodes.

Public-cloud providers offer industry-specific services—such as built-in HIPAA compliance and data formats for fast interoperability for healthcare companies—as well as cross-industry services such as natural-language-processing (NLP) capabilities for swift analysis of unstructured data. Providers can also support enterprises in quickly spinning up required infrastructure services using prebuilt “infrastructure as a code” templates to accelerate the data journey.

Cloud enables disruptive business models and data and analytics use cases through federated data architectures that make data easily sharable across business units and within enterprise alliances. By extending the availability of data for analytics use cases, organizations can unlock novel business opportunities.

Ecosystems and corporate alliances can tap cloud to share selected data securely among consortium members. This increased data availability can be a catalyst for novel insights and use cases. In larger corporations, cloud also facilitates the consolidation of data architecture across distinct business units through standardization while optimally supporting a federated governance model.

A hybrid transformation model

In most cases, a full move to cloud isn’t necessary to begin to capture its benefits. In fact, the target hosting option will typically be hybrid and offer advantages from both worlds: cloud provides organizations with the opportunity to explore compute needs as their use cases evolve. This flexibility from cloud service providers (CSPs) reduces up-front capital expenditures.

A full move to cloud requires at least a minimal optimization of an organization’s current applications and databases to assess whether reengineering for cloud will pay off on an individual basis. Hence, a hybrid target state includes on-premises offerings for legacy systems that can’t migrate to cloud due to regulatory guidelines (for example, anonymization) or technology requirements (such as protocol harmonization for IoT use cases).

The benefits of cloud for product development

Cloud offers several distinct advantages for data and product development:

  • Agility and flexibility. Development teams can increase productivity by approximately 30 percent through an agile operating model comprising DevOps, DataOps, and MLOps tooling. These offerings are readily available on cloud platforms and can help organizations generate value from their data. The flexibility of cloud service provisioning enables organizations to build out a modular data architecture to serve a variety of use cases.
  • Innovation. Accelerating the development and deployment of innovative analytics—for example, by using readily available automated pipelines and development models for AI solutions—can increase revenues by as much as 20 percent. In addition, the faster innovation cycles of integrated cloud services help organizations stay on top of technological trends and developments across infrastructure, platform, and AI.
  • Effectiveness and elasticity. Organizations can cut infrastructure costs by as much as 20 percent by automating infrastructure orchestration and monitoring and by dynamically adapting provisioned compute resources to the required workload.
  • Resilience, reliability, and compliance. Cloud providers offer business continuity and disaster recovery solutions through several availability zones1 in a single cloud region, which can reduce downtime by up to 30 percent. Further, organizations can transfer regulatory compliance to CSPs by harnessing serverless compute for applications.

Cloud-enabled architecture archetypes

As organizations consider moving to cloud, they should keep in mind that there is no standardized architecture and technology to support data in cloud, in the way that Kubernetes and containerization enable application hosting on cloud. IT architecture archetypes are required to introduce a high degree of standardization in addressing a company’s individual data-management needs.

Organizations can choose among the following five archetypes, depending on the requirements of their use cases, degree of required data centralization, underlying infrastructure (such as multicloud2), type of data, and internal users’ skills and capabilities:

  1. Data lakes offer central, cost-effective, scalable storage for large volumes of structured and unstructured data. As the platform evolves, organizations have the flexibility to add analysis capabilities (such as streaming and SQL analytics). Data lakes require users to have a high level of skill and experience to analyze unfamiliar and unprocessed data.
  2. Cloud-native data warehouses are a high-performance, reliable, SQL-driven platform for business intelligence and reporting. This archetype accommodates only structured data and offers little possibility to innovate beyond business intelligence and reporting. However, it does allow end users with minimal skills to customize output based on business needs.
  3. Lakehouses combine the advantages of a data lake’s cost-effective, scalable storage with a data warehouse’s reliable and performant reporting offering. They provide central storage for business intelligence and SQL analytics as well as for data applications requiring unstructured or near- or real-time data.
  4. Data mesh uses a decentralized data architecture that allows for federated development and provisioning of data products. Companies must have strong capabilities in data observability and discoverability to make data accessible across the organization. Data product owners create, maintain, and offer scalable, business-oriented data products as a service to the entire organization, requiring enterprises to adopt an agile working model.
  5. Data fabric establishes a metadata layer across multiple data environments. For example, in a multicloud scenario, data fabric ensures unified data management, including governance, cataloging of assets, integration and pipelining, and security.

Choosing the right architecture archetype is often essential to driving the right transformation, so the decision should consider both technological and organizational factors.

Accelerate value capture from data through cloud

A data transformation that successfully captures value through cloud begins with comprehensive preparations that include both the enablers of the transformation and the transformation itself. We see two clear prerequisites:

  • Articulate the data target vision and strategy. By outlining the data strategy and how it will help achieve business goals, companies can identify appropriate use cases. Organizations can also explore possible synergies with existing CSP offerings.
  • Define the required talent and target operating model. Companies must clarify the talent and roles needed to support the data strategy and target operating model in a cloud-based environment. A realistic hiring plan should specify which technological capabilities can be built within the organization and which need to be acquired.

Both of these steps are essential to making the right decisions on technology. We have seen companies define a cloud transformation their team could not support and which therefore failed. Once the vision and strategy—and the talent and technology needed to achieve them—are clear, the next steps are choosing the best tools and methods to achieve the goals:

  • Determine the target architecture and the split between on-premises and cloud capabilities. Companies should conduct a deep dive on regulatory, data-privacy, and data-residency requirements. Assessing the feasibility of cloud migration based on the existing landscape can help companies determine the right balance of on-premises systems and cloud services. This exercise can identify the cloud-based platforms to support the chosen archetype for the future data architecture and align the technology with talent availability.
  • Shift left. By defining the required adoption of tools and further automations of underlying data provisioning, companies can enable DataOps and less time-consuming processes.

Finally, the right technology and operating model will not materialize by themselves. They need to be enabled through a clearly defined plan and a well-developed business case:

  • Develop a road map and business case. Companies can build a business case for cloud by calculating the potential value creation by use case and cost variations due to architecture choice, cloud-migration decisions, and operating-model adaptation. IT leaders should then determine which processes can be automated and which solutions can be provided by the CSP—and define the appropriate model for partnership, including incentives for mutual success. A detailed road map lays out the steps needed to capture value from data, including development of the data architecture, cloud migration, establishment of use cases, talent needs, and target operating model.

Companies often find it difficult to articulate the potential value of a data transformation. However, a thorough analysis to determine the value of the underlying use cases can facilitate the budget discussion.


Cloud acts as a significant enabler and accelerator of a data transformation. To capture these benefits, a full move to cloud is not required—and, in most cases, is not even desirable. Since data architectures typically lack standardization, an architectural archetype must be chosen carefully based on the enterprise’s needs.

Srikanth Dola is an associate partner in McKinsey’s Bay Area office, Aziz Shaikh is a partner in the New York office, Henning Soller is a partner in the Frankfurt office, and Lisa Weiß is a consultant in the Vienna office.

1 A cloud availability zone describes one of a number of isolated data centers in a cloud region from which a CSP operates and provides its services to end customers.

2 Multicloud is the use of multiple cloud computing and storage services from different vendors in a single architecture.