A new co-packaged optical innovation could replace electrical interconnects in data centers to deliver significant improvements in speed and power efficiency for AI and other computing applications.
December 9, 2024
YORKTOWN HEIGHTS, NY – December 9, 2024: IBM (NYSE: IBM) has unveiled groundbreaking research in optical technology that could significantly improve the way data centers train and run generative AI models. Researchers have developed a new co-packaged optics (CPO) process, the next generation of optical technology, to enable connectivity within data centers at the speed of light using optics to complement existing short-range electrical wires. By designing and assembling the first publicly announced polymer optical waveguide (PWG) to power this technology, IBM researchers have shown how CPO will redefine how the computing industry transmits high-bandwidth data between chips, circuit boards and servers.
Today, fiber optic technology carries data at high speed over long distances, handling almost all of the world’s commercial and communications traffic with light rather than electricity. Although data centers use fiber optics for their external communications networks, data center racks still primarily handle communications over copper electrical wires. These cables connect GPU accelerators that can spend more than half their time idle, waiting for data from other devices as part of a large distributed training process that can incur significant expense and energy.
IBM researchers have demonstrated a way to integrate the speed and capacity of optics into data centers. In a technical documentIBM presents a new CPO prototype module capable of enabling high-speed optical connectivity. This technology could significantly increase data center communications bandwidth, minimizing GPU downtime while significantly accelerating AI processing. This research innovation, as described, would:
- Reduce costs to scale generative AI thanks to a reduction in power consumption of more than 5 times compared to mid-range electrical interconnects (1)while extending the length of data center interconnection cables from one to several hundred meters.
- Faster training of AI models, enabling developers to train a Large Language Model (LLM) up to five times faster with CPO than with conventional electrical wiring. CPO could reduce the time needed to train a standard LLM from three months to three weeks, with performance gains increasing through the use of larger models and more GPUs.(2)
- Significantly increased energy efficiency for data centers, saving the energy equivalent of the annual electricity consumption of 5,000 US homes per trained AI model.(3)
“As generative AI demands more energy and processing power, the data center must evolve – and co-packaged optics can future-proof these data centers,” said Dario Gil, senior vice president and director of research at IBM. “With this advancement, tomorrow’s chips will communicate in the same way that fiber optic cables carry data to and from data centers, ushering in a new era of faster, more durable communications capable of manage the AI workloads of the future. »
80x faster bandwidth than current chip-to-chip communication
In recent years, advances in chip technology have made it possible to densely pack transistors on a chip; IBM’s 2-nanometer node chip technology can contain more than 50 billion transistors. CPO technology aims to increase interconnection density between accelerators by allowing chipmakers to add optical paths connecting chips on an electronic module beyond the limits of current electrical paths. The IBM paper explains how these new high bandwidth density optical structures, combined with the transmission of multiple wavelengths per optical channel, have the potential to increase bandwidth between chips by up to 80 times in relation to electrical connections..
IBM’s innovation, as described, would allow chipmakers to add six times more optical fibers to the edge of a silicon photonic chip, called “seaside density,” compared to CPO technology current peak. Each fiber, about three times the width of a human hair, could span centimeters to hundreds of meters in length and transmit terabits of data per second. The IBM team assembled a high-density PWG with 50-micrometer pitch optical channels, adiabatically coupled to silicon photonic waveguides, using standard assembly packaging processes.
The document further states that these CPO modules with PWG at 50 micrometer pitch are the first to pass all the stress tests required for manufacturing. Components are subjected to high humidity environments and temperatures ranging from -40°C to 125°C, as well as mechanical durability testing to confirm that optical interconnects can bend without breaking or losing data. Additionally, researchers demonstrated PWG technology with a pitch of 18 micrometers. Stacking four PWGs would allow up to 128 channels for connectivity at this pitch.
IBM’s continued leadership in semiconductor R&D
CPO technology opens a new path to meet the growing performance demands of AI, with the ability to replace off-module communications from electrical to optical. It continues IBM’s history of leadership in semiconductor innovation, which also includes the first 2nm node chip technology, the first implementation of 7nm process technologies and 5nm, Nanosheet transistors, vertical transistors (VTFET), single cell DRAM and chemically amplified photoresists. .
Researchers performed design, modeling, and simulation work for CPO in Albany, New York, which the U.S. Department of Commerce recently selected to host America’s first National Semiconductor Technology Center (NSTC), NSTC EUV Accelerator. Researchers assembled prototypes and tested modules at IBM’s Bromont, Quebec, facility, one of the largest chip assembly and testing sites in North America. Part of the northeast semiconductor corridor between the United States and Canada, the IBM Bromont plant has been a world leader in chip packaging for decades.
About IBM
IBM is a leading global provider of hybrid cloud and AI and consulting expertise. We help our customers in more than 175 countries leverage insights from their data, streamline their business processes, reduce costs, and gain a competitive advantage in their industry. More than 4,000 government entities and enterprises in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM and Red Hat OpenShift’s hybrid cloud platform to deliver their digital transformations fast, efficient and secure. IBM’s groundbreaking innovations in AI, quantum computing, industry-specific cloud solutions and consulting provide open and flexible options for our clients. All of this builds on IBM’s long-standing commitment to trust, transparency, accountability, inclusiveness and service. Visit www.ibm.com for more information.