Capacity Will Always Be Constrained Unless…

The Industry Moves from Reactive Responses to Predictive Planning

Our industry is plagued by continued shortages in capacity and delays in development,  costing AI companies in their valuations and even in lost revenue. In the past few years, the most discussed causes  have been power and water availability and community pushback.   Clearly, if there is no power, there is no learning, inference, or  cloud.  Similarly, communities feel threatened that their resources will be strained and that utility costs (for example, water and electricity) will rise with astronomical demand.  But I would argue there is a bigger problem that actually encompasses these constraints,  and even others  we haven’t yet discussed.   It requires us to not just react to the latest problem that pops up and fix it, but to think much more strategically or even transformationally in how we think (see PIMBY article in the last issue of Interglobix).  What is that problem? It’s the supply chain. We have a habit of just solving the problems in the supply chain as they show up.  Some examples of such issues in my past included substation lead times, generator lead times, shortage of workers and the list goes on. We claim to be surprised by the power ramp that occurred in 2023; the same is true for water in 2025, and now this year, with communities pushing back.  I would argue that these were not surprises but rather people we are not listening to or just not feeling the pain yet to come.  This article discusses the need to completely rethink the supply chain in order to keep up with the accelerating scaling.  The question to ask is: Why can’t the supply chain keep up? There are many reasons for this, but here are, in my opinion, the top two:

1 Lack of Transparency
2 Lack of Integration and Standardization
So let’s explore these in more depth.

Christian Belady, Senior Advisor, Digital Bridge, Industry Advisor, and Board Member

LACK OF TRANSPARENCY

One of the biggest reasons for delays is the lack of transparency in the supply chain, a factor that has been somewhat ignored by the people it affects the most—the hyperscalers. At a time of unprecedented growth and delivery urgency, rethinking how and when they signal their demand should be the top priority.  Demand plans are always highly proprietary since companies (particularly public companies) don’t want to signal their volatile growth plans.

This issue is not new; it has been studied in academia for decades, and many of us have learned about it in business school through the classic exercise called the “Beer Game” as shown in Figure1.

The game is usually played in a lab setting to demonstrate how quickly supply chain management can get out of hand despite students’ efforts, with production of goods oscillating between extreme surplus and scarcity.  Why is this?  It all has to do with the demand latency in the supply chain.   To illustrate this, imagine that a hot spell hits and the community demand for beer spikes. A buying frenzy happens, and the retailer runs out of beer at noon, leaving customers dry. He quickly orders more beer from his wholesaler, but he orders only enough to fill his shelves and replenish his inventory, which of course has been depleted.  The wholesaler gets the order that afternoon, but also sees that the order is higher than usual and places  his increased order with the distributor that evening, who sees it the next morning.  In the meantime, the retailer is still screaming the next day that he still needs beer since it’s still hot outside. The distributor then sends his order to the factory by lunchtime, and the factory sees an increase in demand and turns up production.   This procedure may take a few days since the beer has to ferment.   In the meantime, the hot spell has passed and demand on the retailer has gone down, but the factory is still producing more beer.   This problem could be exacerbated further as each player in the chain adds a little buffer to make sure they don’t miss the opportunity.  The problem now is that there is more beer on the truck, and ,unknown to the factory, the heatwave is gone, but they’re  still getting a few days’ old demand from the distributor, who still doesn’t know that the retailer no longer needs higher volumes of beer, so they are still producing and still shipping.  The net effect is that everyone now has a surplus.  To counter this, the same process happens as each player tells their respective supplier, “No more beer!” This is the “bullwhip” effect, and now the whole thing repeats itself… feast, famine, feast, etc. So what is the result?  Higher cost and unmet customer needs.  In other words,  customer demand is not met at the most crucial time (a heat wave), and now there is excessive (and expensive) inventory sitting in the supply chain.  So now the next time the signal comes,the supply chain does not trust the demand from their customer, and is much slower to respond.  This is exactly what the data center industry faces.  First, based on my own experience, hyperscalers may actually have a new demand signal months in advance, but they need approvals and agreement for the new timelines.   By the time the demand is approved, it could be months, even before it has left the confines of the corporate walls.   Now they go and do an RFI/RFP to a GC or a lease provider to decide on who will build or which lessor will build the DC, and more time lapses before a decision is made.  Those lessors in turn go to their suppliers to get quotes on all the equipment, power, subcontractors, etc., through RFPs.  In this case, the supplier may have multiple RFPs in parallel for the same demand from different lessors, but has no way of knowing this, which could also take on the order of months.   So you can see how quickly the signal gets delayed the deeper it goes into the supply chain.  Moreover, there is more and more double counting the deeper one goes because of RFPs always going to multiple providers.  As a result, there is no trust in the signal, and no one expands factory capacity because the risk is too high. There is no incentive to build more capacity. If anything, the incentive is to keep the factory at current levels and just charge more for the product since everyone is desperate.

Figure 1: Traditional Supply Chain

So how do you solve this problem?  What is needed is complete transparency to the signal coming from the ultimate customer—the end user, or the hyperscaler.  Everybody sees the demand signal at the same time throughout the whole supply chain. Figure 2 below shows this. Note that the signal is broadcast to everyone downstream for real-time information.  There are no delays.  Decisions can be made on real-time information. Additionally, this will eliminate double counting since the supplier deeper in the supply chain sees the native signal. There is so much opportunity in this area. Companies that have vertically integrated manufacturing capabilities can do a much better job with this. Alternatively, there are software companies  developing ways to codify the supply chain or even the construction process, to help drive a real-time view of the state of the supply chain such as XYZ Reality.  Until this is done, we will be hitting one supply chain constraint after another. Again, this is a huge opportunity for a much more collaborative and integrated supply chain which leads us to our next point.

Figure 2 – Supply Chain with Transparency

LACK OF INTEGRATION AND STANDARDIZATION

As the scale of our business increases, DCs have more and more components in the supply chain.  The component and SKU counts can’t continue to grow for a quality and timely data center build.   The more there are, the more likely there will be catastrophic disruption.   Integration of components into a manufactured skid or container-like module is a way to solve this by modularizing and moving more into manufacturing.  Microsoft did this in the early days of its cloud expansion.  The “kit of parts” concept was the brainchild of Brian Mattson and Dan Costello in 2008.   Their vision was that all of the various systems should be integrated and built in a factory and have SKUs that were “Lego building blocks” for rapid deployment and assembly on site.  One could argue it was the classic innovator’s dilemma and was well before its time, but that concept is re-emerging in current-day expansions and shows promise today.

Alternatively, integration also comes with innovation. For example, innovative companies such as DG Matrix are developing programmable multiport Solid State Transformers that can collapse many of the SKUs in the MEP and substantially simplify the electrical design.   Simplification reduces the likelihood supply chain shortages, lowers cost and improves reliability.  Figure 3 shows a typical DC electrical architecture on the left and compares it to the simplified SST architecture on the right, with more than a 50 percent reduction in components, substantially reducing supply chain complexity.

But of course, one of the most powerful scaling opportunities is standardization.   The “Mattson/Costello” kit-of-parts concept was heavily based on the idea that the interfaces on the SKUs were all standardized. In that case, any manufacturer’s product that follows the standard interface spec could become “plug and play.”  Currently, most DC developers have their own specification and requirements, resulting in little or no fungibility among SKUs.  Yet they all host the same servers and GPUs.   Standardizing would certainly reduce the embedded costs in the supply chain and open up alternative suppliers’ products in the event of manufacturing shortages.Again, another huge opportunity.

Figure 3 – Traditional vs Solid State Transformer (SST) Solution

SUMMARY

Frankly, this discussion needs more depth—perhaps for another time.  However, my purpose for this article (as with my other articles) is to start a debate on improving the supply chain for our industry and thinking ahead in a strategic way. Unless we have a systematic way to drive transparency, integration, and standardization, we will find ourselves mired in constraints in the supply chain at a time when we need to be nimble and have rapid-response capabilities to meet our customers’ hyper scaling needs.  So, as I always end my articles, we have to THINK BIGGER!

ABOUT THE AUTHOR

Christian Belady is highly experienced in managing data center and infrastructure development on a global scale. Currently, he is an advisor and board member of several companies in the infrastructure space. Previously, he served as Vice President and Distinguished Engineer of Data Center R&D for Microsoft’s Cloud Infrastructure Organization, where he developed one of the largest data center footprints in the world. Before that, he was responsible for driving the strategy and delivery of server and facility development for Microsoft’s data center portfolio worldwide. With over 160 patents, Belady is a driving force behind innovative thinking and quantitative benchmarking in the field. He is an originator of the Power Usage Effectiveness (PUE) metric, was a key player in the development of the iMasons’ Climate Accord (ICA), and has worked closely with government agencies to define efficiency metrics for data centers and servers. Over the years, he has received many awards, most recently the NVTC Data Center Icon Award, and was elected to the National Academy of Engineering.