AI in semiconductor manufacturing: The next S curve?

(14 pages)

As generative AI (gen AI) applications such as ChatGPT and Sora take the world by storm, demand for computational power is skyrocketing. The semiconductor industry finds itself approaching a new S-curve—and the pressing question for executives is whether the industry will be able to keep up.

Leaders are responding by committing substantial capital expenditures to expand data centers and semiconductor fabrication plants (fabs) while concurrently exploring advancements in chip design, materials, and architectures to meet the evolving needs of the gen AI–driven business landscape.

To guide semiconductor leaders through this transformative phase, we have developed several scenarios for gen AI’s effect in the B2B and B2C markets. Every scenario involves a massive increase in compute—and thus wafer—demand. These scenarios focus on the data centers while acknowledging that implications for edge devices such as smartphones exist but on a much smaller scale.

About QuantumBlack, AI by McKinsey

QuantumBlack, McKinsey’s AI arm, helps companies transform using the power of technology, technical expertise, and industry experts. With thousands of practitioners at QuantumBlack (data engineers, data scientists, product managers, designers, and software engineers) and McKinsey (industry and domain experts), we are working to solve the world’s most important AI challenges. QuantumBlack Labs is our center of technology development and client innovation, which has been driving cutting-edge advancements and developments in AI through locations across the globe.

The demand scenarios, developed from McKinsey analysis, are based on the wafer output that the semiconductor industry could potentially deliver, given constraints such as capital and equipment. While even scenarios that are more ambitious are plausible, the implications for the required number of fabs and the energy supply necessary for the data centers will make them unlikely.

This article will discuss the estimated wafer demand of high-performance components, including logic, memory, data storage chips, and the corresponding number of fabs needed to supply them. Equipped with this information, industry stakeholders can strategically plan and allocate resources to address the burgeoning demand for compute power, ensuring the scalability and sustainability of their operations in the years to come.

Components of gen AI compute demand

The surge in demand for AI and gen AI applications comes with a proportional increase in compute demand. However, it is essential for semiconductor leaders to understand the origins of this demand and how gen AI will be applied. We expect to see two different types of applications for gen AI: B2C and B2B use cases. Within both the B2C and B2B markets, the demand for gen AI can be categorized into two main phases: training and inference. Training runs usually require a substantial amount of data and are compute-intensive. Conversely, inference usually requires much lower compute for each run of a use case.

To empower semiconductor leaders to navigate the intricacies and demands of these markets, we outline six use case archetypes for B2B compute demand and their corresponding compute cost to serve and concurrent level of gen AI value creation.

Six B2B use case archetypes for gen AI application and workload

McKinsey analysis estimates that B2C applications will account for about 70 percent of gen AI compute demand because they include the workload from basic consumer interactions (for example, drafting emails) and advanced user interactions (for example, creating visuals from text). B2B use cases are expected to make up the other approximately 30 percent of the demand. These include use cases such as advanced content creation for businesses (for example, gen AI–assisted code creation), addressing customer inquiries, or generating standard financial reporting.

B2B applications across industry verticals and functions fall into one of six use case archetypes:

coding and software development apps that interpret and generate code
creative content–generation apps that write documents and communication (for example, to generate marketing material)
customer engagement apps that cover automated customer service for outreach, inquiry, and data collection (for example, addressing customer inquiries via a chatbot)
innovation apps that generate product and materials for R&D processes (for example, designing a candidate drug molecule)
simple concision apps that summarize and extract insights using structured data sets (for example, to generate standard financial reports)
complex concision apps that summarize and extract insights using an unstructured or large data set (for example, to synthesize findings in clinical images such as MRI or CT scans)

McKinsey has organized these six diverse and complex B2B use cases according to their compute cost to serve and concurrent gen AI value creation (Exhibit 1). By defining the cost to serve and value creation, decision makers can more adeptly navigate the specifics of B2B use cases and make well-informed choices when adopting them. At its core, the analysis of compute cost to serve comprises training, fine-tuning, and inferencing costs. The analysis also encompasses a hyperscaler’s infrastructure as a service (IaaS) margin, which includes compute hardware, server components, IT infrastructure, power consumption, and estimated talent costs. Gen AI value creation is gauged through metrics such as productivity improvement and labor cost savings.

B2B use cases are defined by their value creation and cost to serve.

Gen AI demand scenarios

As organizations navigate the complexities of adopting gen AI, strategic utilization of these archetypes becomes imperative. Factors such as the economics of gen AI adoption, algorithm efficiency, and continual hardware advancements at both component and system levels further influence adoption of gen AI and technological progress. Three demand scenarios—base, conservative, and accelerated—represent the possible outcomes of gen AI demand for B2B and B2C applications. The base scenario is informed by a set of required assumptions, such as consistent technological advancements and rapid adoption, supported by business models that cover the capital and operating costs of gen AI training and inference. The conservative and accelerated adoption scenarios represent adoption upside and downside, respectively.

McKinsey analysis estimates that by 2030 in the base scenario, the total gen AI compute demand could reach 25x10³⁰ FLOPs (floating point operations), with approximately 70 percent from B2C applications and 30 percent from B2B applications (Exhibit 2).

In our base scenario, realized demand of generative AI is about 70 percent for B2C and 30 percent for B2B.

B2C compute demand scenarios

B2C compute demand is driven by the number of consumers who engage with gen AI, their level of engagement, and its compute implication. Specifically, B2C inference workloads are determined by the number of gen AI interactions per user, the number of gen AI users, and FLOPs per basic and advanced user interaction. Training workloads are determined by the number of training runs per year, the number of gen AI model providers, and FLOPs per training run by different gen AI models (for example, a state-of-the-art model such as GPT-4 in 2023 and smaller or prior generations of models). For all scenarios, it is essential that companies can develop a sustainable business model.

For all scenarios, it is essential that companies can develop a sustainable business model.

Base adoption. By 2030, the expected average number of daily interactions per smartphone user (with one interaction being a series of prompts) is ten for basic consumer applications, such as creating an email draft. The other expected average number is for advanced consumer applications, such as creating longer texts or synthesizing complex input documents. By using current numbers from online and application-based search queries, McKinsey analysis estimates the number of interactions to be approximately twice the forecast daily number of online search queries (approximately 28 billion) in 2030. The underlying assumptions that will enable the base B2C scenario are steady technological advancements, favorable regulatory developments, and continuously growing user acceptance.

Conservative adoption. This scenario could involve cautious adoption from consumers due to ongoing concerns related to data privacy, regulatory developments, and only incremental improvements in the technology, which would lead to half the number of interactions of the base case.

Accelerated adoption. This scenario suggests a high degree of trust in the technology and widespread user acceptance. Drivers for this scenario could be attractive new business models, substantial technological advancements, and compelling user experiences. These drivers could lead to a higher adoption rate (150 percent) of the number of interactions for consumer applications in the base case.

B2B demand scenarios

The adoption of gen AI use cases in the B2B sector is significantly influenced by the sufficiency and cost of semiconductor chip supply. Enterprises must be capable of rationalizing their investment in compute infrastructure, ensuring that the cost of service is lower than the company’s willingness to pay. For these B2B demand scenarios, McKinsey analysis assumes that the willingness to pay corresponds to approximately 20 percent of the total value creation.

In the context of B2B use cases, McKinsey analysis indicates that of the six use case archetypes, only five are economically viable for a broad adoption (Exhibit 3). The sixth archetype, complex concision, is not expected to be adopted broadly due to limited value creation compared to its cost through administrative labor cost savings, coupled with a significant consumption of compute power in analyzing complex and unstructured data inputs.

We estimate only five out of six use case archetypes will be economically viable and assumed to be widely adopted by 2023.

Base adoption. The base scenario assumes a midpoint adoption rate spanning eight to 28 years, indicating that B2B use cases achieve 90 percent adoption in 18 years.¹ Furthermore, McKinsey analysis assumes that businesses will realize value beginning in 2024. Securing investments for manufacturing capacity, manufacturing wafers, provisioning compute capacity, and training people to use new services all take time. As such, we assume a lead time of approximately two years in the manufacturing of wafers before value can be captured. This business realization is expected to produce approximately 25 percent of value captured by 2030 for the economically viable use cases. In this scenario, we assume the additional value from all small-scale improvements in labor productivity follow the same overall ratio as the calculated value potential from the six use case archetypes.

Conservative adoption. This scenario assumes an approximately 90 percent adoption rate over 28 years, yielding only approximately 15 percent in value capture by 2030. This deceleration could be attributed to a confluence of factors, including—but not limited to—regulatory constraints, data privacy concerns, and data processing challenges.

Accelerated adoption. This scenario assumes an approximately 90 percent adoption rate in about 13 years. This acceleration is contingent upon catalysts such as attractive business models, rapid technological advancement, or favorable regulations. For example, disruptive hardware architectures will substantially reduce the cost to serve. Additionally, enhancements to the process of software validation may significantly boost the efficiency of gen AI solutions. Factors such as these may expedite the adoption curve and cause a notable uptick in gen AI implementation in the semiconductor industry by 2030.

The adoption of gen AI use cases in the B2B sector is significantly influenced by the sufficiency and cost of semiconductor chip supply.

Gen AI data center infrastructure and hardware trends

Along with considering scenarios for gen AI compute demand, semiconductor leaders will need to adapt to changes in underlying hardware and infrastructure, mainly to data center infrastructure, servers, and semiconductor chips.

Data center infrastructure

Gen AI applications typically run on dedicated servers and in data centers. At first glance, AI data centers might look similar to traditional data centers, but there are considerable differences (see sidebar “Components of an AI server”).

Components of an AI server

AI data centers and servers differ from traditional models. There are nine components of the AI server that are most relevant to semiconductor leaders (exhibit).

An AI server is made up of numerous integral components.

CPU (central processing unit). The CPU manages system-level functions, coordinates data flow, and executes tasks that require a more generalized computing approach. Collaboration between CPUs and specialized processors ensures a balanced and efficient operation, optimizing the utilization of each component’s strengths within the AI server.
GPU (graphics processing unit). The GPU is a specialized processor designed to handle complex mathematical computations in parallel, making it an essential component in AI data centers for accelerating training and inference compute.
AI accelerator. This is a specialized semiconductor component designed to accelerate AI workloads by performing high-speed computations and optimizing the cost and performance of AI algorithms in data centers.
DDR memory (double data rate memory). A variant of dynamic random-access memory (DRAM), DDR memory provides high-speed, volatile memory, facilitating rapid data access for enhanced overall system performance.
HBM (high-bandwidth memory). A variant of DRAM, HBM is specifically built for very high-bandwidth use cases, such as AI training and inference, achieving speeds of more than ten times the standard DRAM.
NAND (“not-and”) storage. This is used to store the operating system, model, user input, and other components.
Interconnects. Equipped with optical transceivers, interconnects enable seamless communication between compute components, ensuring efficient data exchange.
Mainboard. The mainboard serves as the central hub, coordinating the collaboration of various components, all powered by a reliable power supply unit and maintained at optimal conditions by cooling fans. Encased in a well-structured chassis, these components collectively form the sophisticated architecture essential for meeting the computational demands of generative AI within a dedicated data center environment.
Power supply unit. The AI server is equipped with several power supply units with redundancy to reduce risk of failure.

Rack densities—that is, the power consumed by a cabinet of servers—demonstrate the biggest difference between traditional and AI data centers. General-purpose data centers have rack power densities of five to 15 kilowatts (kW), whereas AI training workloads can consume 100 kW—or, in some cases today, up to 150 kW. This number is expected to increase, with some experts estimating power densities of up to 250 kW or even 300 kW in the next few years.²

Additionally, as rack power density rises, rack cooling will switch from air-based cooling to liquid cooling. Direct-to-chip liquid cooling and full-immersion cooling will also require new server and rack designs to accommodate for additional weights.

Servers

In response to the increasing demand for computational power, servers will employ high-performance graphics processing units (GPUs) or specialized AI chips, such as application-specific integrated circuits (ASICs), to efficiently handle gen AI workloads through parallel processing. Today, infrastructure for gen AI training and inference is expected to bifurcate as inference’s compute demand becomes more specific to the use case and requires much lower cost to be economical.

Training. Training server architecture is expected to be similar to today’s high-performance cluster architectures in which all servers in a data center are connected to high-bandwidth, low-latency connectivity. The prevailing high-performance gen AI server architecture uses two central processing units (CPUs) and eight GPUs for compute. In 2030, most training workloads are expected to be executed using this type of CPU+GPU combination. A transition to system-in-a-package design for GPUs and AI accelerators is also expected, with both architectures expected to coexist.

Inference. Current inference workloads run on infrastructure that is similar to the training workload. As gen AI consumer and business adoption increases, the workload is expected to shift to mostly inference, which favors specialized hardware due to lower cost, higher energy efficiency, and faster or better performance for highly specialized tasks.

In 2030, we expect more inference-specific AI servers using a combination of CPUs and several purpose-built AI accelerators that use ASICs.

Gen AI wafer demand on the semiconductor industry

Pursuing innovation in semiconductors to capture generative AI value

Even though the field of generative AI is emerging, we have seen an uptick in innovative technologies and solutions in the past two to three years. To spur innovation, large amounts of global investment are needed across the value chain in all three scenarios. If all players invest in innovation, their efforts could reduce costs, optimize compute efficiency, or increase capacities to meet demand. Examples of this could include the following:

new algorithm designs to reduce computational requirements, in terms of both number of operations and memory demand—for example, as seen in the invention of different transformer models, which represented a new approach to designing algorithms aimed at decreasing computational demands
new chip architectures that achieve higher performance using the same area of silicon (several start-ups have already developed such an architecture)
increased memory density of chips to increase their storage capacity (for example, by using data compression similar to Linux’s zram but implemented on the chip)
improved high-speed networks between servers to provide faster access to the memory of other servers, thereby reducing the need for storing local duplicates of data
optimized software or compilers to improve system-level infrastructure compute efficiency

McKinsey analysis estimates the wafer demand of high-performance components based on compute demand and its hardware requirement: logic chips (CPUs, GPUs, and AI accelerators), memory chips (high-bandwidth memory [HBM] and double data rate memory [DDR]), data storage chips (NAND [“not-and”] chips), power semiconductor chips, optical transceivers, and other components. In the following sections, we will look more closely at logic, HBM, DDR, and NAND chips. Beyond logic and memory, we anticipate that there will be an increase in demand for other device types. For instance, power semiconductors will be in higher demand because gen AI servers consume higher amounts of energy. Another consideration is optical components, such as those used in communications, which are expected to transition to optical technologies over time. We have already seen this transition for long-distance networking and backbones that reduce energy consumption while increasing data transmission rates. To spur innovation in almost all areas of the industry, it is necessary to combine these new requirements with the high level of investment anticipated (see sidebar “Pursuing innovation in semiconductors to capture generative AI value”).

In 2030, AI accelerators with ASIC chips are expected to serve the majority of workloads because they perform optimally in specific AI tasks.

Logic chips

Logic chip demand depends on the type of gen AI compute chip and type of server for training and inference workloads. As discussed earlier, by 2030, we anticipate the majority of gen AI compute demand in FLOPs to come from inference workloads. Currently, there are three types of AI servers that can manage inference and training workloads: CPU+GPU, CPU+AI accelerator, and fusion CPU+GPU. Today, CPU+GPU has the best availability and is used for inference and training workloads. In 2030, AI accelerators with ASIC chips are expected to serve the majority of workloads because they perform optimally in specific AI tasks. On the other hand, GPU and fusion servers are ideal for handling training workloads due to their versatility in accommodating various types of tasks (Exhibit 4).

Server architecture is estimated to shift toward CPUs with AI accelerators by 2030.

In 2030, McKinsey estimates that the logic wafer demand from non–gen AI applications will be approximately 15 million wafers. About seven million of these wafers will be produced using technology nodes of more than three nanometers, and approximately eight million wafers will be produced using nodes equal to or less than three nanometers. Gen AI demand would require an additional 1.2 million to 3.6 million wafers produced using technology nodes equal to or less than three nanometers. Based on current logic fab planning,³ it is anticipated that 15 million wafers using technology nodes equal to or less than seven nanometers can be produced in 2030. Thus, gen AI demand creates a potential supply gap of one million to about four million wafers using technology nodes equal to or less than three nanometers. To close the gap, three to nine new logic fabs will be needed by 2030 (Exhibit 5).

By 2030, generative AI will increase demand for wafers significantly.

DDR and HBM

Gen AI servers use two types of DRAM: HBM, attached to the GPU or AI accelerators, and DDR RAM, attached to the CPU. HBM has higher bandwidth but requires more silicon for the same amount of data.

As transformer models grow larger, gen AI servers have been expanding memory capacity. However, the growth in memory capacity is not straightforward, posing challenges to hardware and software design. First, the industry faces a memory wall problem, in which memory capacity and bandwidth are the bottleneck for system-level compute performance. How the industry will tackle the memory wall problem is an open question. Static random-access memory (SRAM) is tested in various chips to increase the near-compute memory, but its high cost limits wide adoption. For example, future algorithms may require less memory per inference run, slowing down total memory demand growth. Second, AI accelerators are lighter in memory compared to CPU+GPU architecture and may become more popular by 2030 when inference workloads flourish. This could mean a potentially slower growth in memory demand.

As transformer models grow larger, gen AI servers have been expanding memory capacity. However, the growth in memory capacity is not straightforward, posing challenges to hardware and software design.

Given these uncertainties, we consider two DRAM demand scenarios in addition to the base, conservative, and accelerated adoption scenarios: a “DRAM light” scenario, in which AI accelerators remain memory-light compared to GPU-based systems, and a “DRAM base” scenario, in which AI accelerator–based systems catch up to GPU-based systems in terms of DRAM demand.

By 2030, we expect DRAM demand from gen AI applications to be five to 13 million wafers in the DRAM light scenario, translating to four to 12 dedicated fabs. In the DRAM base scenario, DRAM demand would be seven to 21 million wafers, translating to six to 18 fabs. The wide range of values reflects the challenges associated with reducing the memory requirements per device.

NAND memory

NAND memory is used for data storage—for instance, for the operating system, user data, and input and output. In 2030, NAND demand will likely be driven by dedicated data servers for video and multimodel data. This data will require substantial storage (for example, for training on high-resolution video sequences and retrieving data during inference). We expect the total NAND demand to be two to eight million wafers, corresponding to one to five fabs. Given that the performance requirement for NAND storage of gen AI will be the same as in current servers, fulfilling this demand will be less challenging compared to logic and DRAM.

Other components

The rising compute demand will create additional demand for many other chip types. Two types are particularly noteworthy:

High-speed network and interconnect. Gen AI requires high-bandwidth and low-latency connectivity between the servers and between the various components of the servers. A larger amount of network interfaces and switches are required to create all the connections. Today, these interlinks are mostly copper-based, but optical connectivity is expected to gain share with rising bandwidth and latency requirements.

Power semiconductors. AI servers need a large amount of electricity and might consume more than 10 percent of global electricity in 2030. This requires many power semiconductors within the server and on the actual devices.

The surge in demand for gen AI applications is propelling a corresponding need for computational power, driving both software innovation and substantial investment in data center infrastructure and semiconductor fabs. However, the critical question for industry leaders is whether the semiconductor sector will be able to meet the demand. To meet this challenge, semiconductor leaders should consider which scenario they believe in. Investment in semiconductor manufacturing capacity and servers is costly and takes time, so careful evaluation of the landscape is essential to navigating the complexities of the gen AI revolution and developing a view of its impact on the semiconductor industry.

Generative AI: The next S-curve for the semiconductor industry?

About the authors