Artificial intelligence (AI) has become an integral part of our lives, from the way we communicate to the way we work. AI has revolutionized the way we process and analyze data, and has enabled us to make more informed decisions. One of the most important developments in AI is the advent of large language models (LLMs). LLMs are neural networks that have a massive number of parameters, allowing them to learn complex patterns in language. These models are pre-trained on large amounts of data and can be fine-tuned for specific tasks. They have been used for a variety of applications, including language translation, text generation, and question-answering.
Transformers are a key development in language modeling. They are an architecture designed around the idea of attention, which makes it possible to process longer sequences by focusing on the most important part of the input. Transformers have had a profound impact on the field of natural language processing and machine learning. Their ability to handle long-range dependencies, facilitate transfer learning, generate language, and support multilingual tasks has propelled them to the forefront of cutting-edge research and applications.

For transformers to be successful, powerful chips are essential. CPUs cannot support highly parallelized deep learning models, so AI chips that enable parallel computing capabilities are increasingly in demand. We are seeing these AI accelerators and GPU across the industry at a TDP of 700W per chip, and we will easily see this increase to at 1KW per chip in the near future. AI workloads, including generative AI, require increased computing power and memory bandwidth. To build better deep learning models and power generative AI applications, organizations need increased computing power and memory bandwidth and capacity. This has led to a growing demand for AI-specific hardware and investment in data center infrastructure. An essential element to AI is the hardware / software co-design. In the general processor or CPU world, developers would wait for a predictable roadmap in hardware and generational clock speed change and develop applications around those constraints. In AI the model is flipped, where software and model requirements are more closely considered before designing all hardware, and the success of software and model efficiency is very closely tied to the hardware.
ABOUT THE AUTHOR
Zaid Kahn is Vice President of Microsoft’s Silicon, Cloud Hardware, and Infrastructure Engineering organization. He leads systems engineering and hardware development for Azure, including AI systems, compute, memory, and infrastructure. Kahn’s teams are responsible for software and hardware engineering efforts as well as specialized compute systems, FPGA network products, and ASIC hardware accelerators. Kahn is part of the technical leadership team across Microsoft that sets AI hardware strategy for training and inference. His team is also responsible for the development of Microsoft’s systems for MAIA and Cobalt custom silicon.
Prior to joining Microsoft, Kahn was the head of infrastructure at LinkedIn. He was responsible for all aspects of architecture and engineering for data centers, networking, computer, storage, and hardware. Kahn also led several software development teams focusing on building and managing infrastructure as code. The network teams Kahn led built the global network for LinkedIn, including POPs, peering for edge services, IPv6 implementation, DWDM infrastructure, and data center network fabric.
Kahn holds several patents in networking and is a sought-after keynote speaker at top-tier conferences and events. He is also currently Chair of Open Compute Foundation (OCP), EECS External Advisory Board (EAB) at UC Berkeley, and a board member of Internet Ecosystem Innovation Committee (IEIC), a global Internet think tank promoting Internet diversity. Kahn has a Bachelor of Science in Computer Science and Physics from the University of the South Pacific.