From GPUs to megawatts: the new demands of AI on infrastructure

data center with neon circuit-board signs illustrating AI - TeleDynamics blog

While AI has delivered marked improvements in our ability to solve the world’s biggest problems in healthcare, finance, manufacturing, retail, and more, these benefits are accompanied by real challenges. One of the most pressing challenges for delivering AI is the strains it puts on network and power infrastructure. As a result, new power plants are sprouting up to be able to fuel AI.

In this article, we explain why AI is so resource-intensive, and how data center design needs to be redefined to accommodate the needs of AI deployment.

Traditional data center design (a review)

Data centers are at the very heart of networking because they run the services that the network delivers to end users. A data center may be deployed within a building, on a campus, throughout a geographically dispersed enterprise, or on the cloud.

Data center size and connectivity requirements depend highly on the services that they support and deliver. To understand them more thoroughly, data centers can be broken down into their constituent components.

The physical components include the following:

Rackspace and cabling: These are the physical structures of the datacenter, where active devices are mounted in standardized racks. Structured cabling interconnects these devices with each other and with the broader communications infrastructure.
Electrical power management: This includes the provisioning of primary and backup power, mains electricity, uninterruptible power supplies (UPS), as well as diesel- or gas-powered generators to ensure continuous operation during outages.
Environmental controls: These systems manage temperature, humidity, and airflow within the data center to ensure optimal operating conditions and prevent hardware damage due to overheating or moisture.
Active devices: These are the computing and networking hardware elements that do the actual work, including equipment such as servers, storage systems, switches, routers, firewalls, and load balancers.

Browse the TeleDynamics online catalog for networking equipment

Beyond the physical layer, data center resources can be categorized into three key areas:

Compute refers to the processing power provided by servers. These servers may host virtual machines, containers, or bare-metal applications and are responsible for executing workloads, running applications, and handling data processing tasks.
Storage includes the systems and technologies used to persistently store data. This can range from local disks on servers to centralized storage arrays and distributed storage systems.
Network encompasses the networking infrastructure that connects all devices within the datacenter and links the datacenter to external networks. This includes Ethernet switches, routers, firewalls, load balancers, and sometimes software-defined networking (SDN) controllers that enable flexible traffic flow and segmentation.

For the most part, data center design is dictated by the compute, storage, and network resources required by the network services being served by the data center. For most conventional services, such as web hosting, email, online applications, gaming, unified communications systems, database management, and a multitude of other applications, the requirements vary somewhat from application to application. But when it comes to AI, the requirements are a whole different ball game!

How AI works

To comprehend the high demand that AI applications have on data center resources, it’s necessary to briefly examine how AI works.

AI encompasses many different approaches. The most well-known AI application is the large language model or LLM. Examples include tools such as OpenAI's ChatGPT, Microsoft's Copilot, and Google's Gemini, to name just a few. LLMs are designed to understand and generate human language. Other AI types include machine learning, deep learning, and computer vision. What all of these models have in common is that they require massive compute, storage, and network resources to perform their functions ― significantly more than conventional network services.

Compute

AI relies on processing vast amounts of data with complex mathematical algorithms. At the heart of AI workloads are graphics processing units (GPUs). As their name suggests, GPUs were originally developed to render images and video and are found in video graphics cards. They are specifically designed for doing many simple mathematical operations in parallel, which are the types of computations graphics rendering requires. It turns out that AI models demand the same type of parallel operations, and this is why GPUs are ideal for AI.

Training an LLM, for example, involves billions and even trillions of parameters. This computational load is extremely high, especially when you take into account the low-latency expectations of real-time use cases. GPUs are ideal for meeting these requirements.

Companies like Nvidia that specialize in GPUs have been in demand and have had unprecedented success in recent years due to the AI boom. Compared with more traditional services delivered in data centers, AI requires vastly more compute resources, resulting in a much higher processor density to operate.

Storage

LLMs and other AI models also rely heavily on vast amounts of data. AI models go through a process of training, where they consume content of all types (text, images, audio, video) that are referenced repeatedly during training. This content is typically in the terabyte to petabyte range in volume. This content must be accessed quickly and repeatedly, which means that very large and very fast storage systems and data pipelines are essential for keeping GPUs fed with data.

Beyond training data, the resulting models themselves must also be stored. This gives you an idea of the enormous amounts of storage required.

Network

Because AI is often deployed in clusters as part of distributed systems, ultra-low-latency, high-throughput networking is required for communications. Links with a minimum of 100 Gbps speed must be deployed to avoid networking bottlenecks, which can significantly slow down training time, increase cost, and result in less-than-optimal usage of storage and compute resources.

Especially when AI models are deployed at scale, serving thousands of users, strong east-west (server-to-server) and north-south (user-to-server) network infrastructure ensures responsive, scalable service.

Designing data centers for AI

Deploying AI in a data center significantly increases the demands on compute, storage, and network infrastructure. The table below outlines how these requirements scale compared with traditional data center environments, offering ballpark estimates of the order-of-magnitude increases typically involved.

Resource	Description	Order of magnitude
Compute	Increased number of GPUs as well as an increase in GPU density.	10–100×
Storage	Increase in storage density, faster storage types to maximize access speed, and greater interconnectivity between storage units.	3–20×
Network	Network capacities between distributed nodes, as well as between AI clusters within the same data center, must be ultra-high throughput and ultra-low latency.	10–50×

The dramatic increase in the use of these resources, in turn, also affects the various physical components of data centers. The resulting requirements include:

More power to run active devices delivering the resources
More rack space to host the active devices and associated cabling
Redundant and multiple network paths, resulting in a higher density of network interfaces
Higher-performance cooling systems due to an increase in active equipment (heat sources)

As an example of some of the required parameters, Nvidia recommends using a 400 Gbps network interconnect with LLMs as well as dozens or even hundreds of GPUs per AI cluster for more efficient AI training.

The table below highlights the key differences between traditional and AI-oriented data center designs. It illustrates how AI workloads often demand significantly greater resources — sometimes by several orders of magnitude.

Aspect	Conventional data center	AI-focused data center
Compute hardware	Primarily CPUs, with some GPU acceleration for specific workloads	High-density GPUs and other AI accelerators
Storage type	SAN/NAS systems, optimized for general-purpose workloads	High-speed NVMe/SSD storage with parallel access
Networking	1/10/25 Gbps Ethernet, mostly north-south traffic	High-speed fabric (100+ Gbps), east-west optimized for model training
Cooling requirements	Moderate, designed for general server racks	High-performance cooling (liquid/immersion often required)
Power consumption	Balanced; lower than AI due to less intensive computation	Very high due to constant GPU utilization

Real-world trends

This AI boom is leading to some interesting trends. One of these is the establishment of purpose-built large-scale data centers for AI. An excellent example of this is the behemoth data center being built in Abilene, Texas, about a 3.5-hour drive from TeleDynamics' Austin offices. Such facilities will provide multitenant AI infrastructure that can be leased by multiple companies and expanded as needed.

However, there is also an interesting phenomenon on the other end of the size spectrum. Many companies are now delivering self-contained AI LLMs that can be deployed onsite and can be trained on customized data sets for specific purposes. These can remain disconnected from the internet for security purposes and can be used either internally by enterprises or organizations or privately by a company’s clients.

Conclusion

AI has introduced a fundamental paradigm shift in the design and deployment of data centers. With a phenomenal increase in resource demand, organizations must now rethink their infrastructure strategies — prioritizing high-performance compute, ultra-fast storage, and low-latency networking — to address the scale, speed, and complexity of modern AI workloads.