News

The challenge of 800G switch to PCB interconnection

2023.12.19

The bandwidth rate challenge brought by AI


According to public data, the number of parameters in the model has increased from 110 million to 175 billion from GPT-1 to GPT-3, and the parameters of the GPT-4 model have reached 1.8 trillion. According to the data of market research organization TrendForce, if calculated based on the processing power of NVIDIA A100 graphics card, the GPT-3.5 large model requires 20,000 GPUs to process training data.


ChatGPT, together with the influx of AI large models, has not only improved the efficiency of retrieval and work in the AIGC (artificial intelligence generated content) industry, but also created a higher demand for high-speed storage and data exchange in addition to GPU computing, as large models require a large amount of data and computing power for training and operation, and generate a large amount of new data during use.


A recent report, "Ethernet Switch Five-Year Forecast" released by Dell'Oro Group, shows that the Ethernet switch data center market is expected to grow at a compound annual growth rate of close to double digits between 2021 and 2026, with a cumulative expenditure of nearly $100 billion in the next five years. It is expected that 400Gbps and higher speeds will account for half of the expenditure, and by 2025, 800Gbps will exceed 400Gbps.


The capacity of exchange chips doubles every two years


Broadcom has always represented the most advanced solution for switching chips. The Tomahawk system chip is a high-bandwidth switch platform chip, and the Tomahawk 5 is a 51.2T switching capacity chip with a 5 nanometer process introduced by Broadcom, mainly targeting the hyperscale enterprise and cloud service commercial switch and router chip markets.


The Tomahawk 5 can drive a total bandwidth of 51.2TB/s, 64 ports running at 800Gb/s, 128 ports running at 400Gb/s, and 256 ports running at 200Gb/s. Very large enterprises and cloud service providers prefer switches with a 128-port base and a 400Gb/s operating speed per port. Among them, 64 ports are connected down to servers in the rack, and 64 ports are connected up to the backbone layer of the network structure. The AI server cluster can use 256 ports running at 200Gb/s, which can greatly meet the data exchange requirements of AI computing.


6383573152389709677743298.jpg


These high-capacity switching chips are generally large FCBGA packages with a chip size greater than 60cm² and more than 8000 LGA pins.


The interconnection scheme of 800G switch based on light transmission and copper retreat


As the transmission rate of high-speed signals increases from 50G to 100G to 200G, the loss of the transmission system increases from less than 10dB to more than 20dB. When the signal transmission rate exceeds 200G, the interconnection complexity of PCB boards increases, and almost all signal line routing lengths will exceed the loss budget of 1m DAC line transmission.


6383573167034764749952448.png


Celestica demonstrated an 800G switch using Broadcom TomaHawk 5 switching chips at the OCP Summit 2023. It uses copper technology with flyover cables to reduce cabling and heat dissipation issues.


6383573170580013725050472.jpg


The flyover cables copper technology has successfully solved the design and technical challenges of 800G switches:


It is easier to route to pluggable optical modules (shorter cable length and better signal quality). Since 800G switches are based on 112Gpbs speeds and dense switching line routing, it is difficult to minimize signal noise when routing on the PCB. The main solutions are to choose a co-packaged optics (CPO) solution or an easy-to-operate/low-cost flyover cables copper solution;


Power consumption and heat dissipation issues: The 800G switching chip and optical module require larger heat sinks for cooling. Due to the use of the flyover cables copper cable solution, there is a large amount of space in between, which is beneficial for installing large heat sinks.


Co-packaged optics (CPO) is another alternative. As 800G switches are being produced and the speed reaches 224Gbps, CPO will become increasingly popular.


6383573172194051059847505.png

Evolution of optical module integration


On current servers, data is transmitted to switches or servers by converting optical signals into electrical signals through optical modules.


In a co-packaged optical device (CPO), functional devices such as optical receivers, amplifiers, and DSPs on optical modules are integrated onto a PCB carrier board. Through the CPO solution, power consumption can be reduced by 30%, cost can be reduced by 40%, and data exchange density can be increased, resulting in a smaller volume.


6383573175209623917527533.png

Ruijie Networks 25.6T silicon optical CPO switch


6383573177309584018143151.png

Ruijie Networks 51.2T silicon optical NPO switch


Note: The above content is collated from the Internet, and the copyright belongs to the original author. If there is any infringement, please contact us for deletion.


Prev: Deeply teardown 4D millimeter-wave radar, from design to PCB solution

Next: NVIDIA GH200 AI server high-definition physical image

Back
Subsidiaries:SGC Germany  |  Sunshine PCB GmbH   |  SGC USA
©Sunshine Global Circuits Co., Ltd.2018 All rights reserved. 粤ICP备05084072号   Design byszweb.cn