“For a high-performance GPU, computing power is the core indicator, and the most basic technology to achieve computing power is high-bandwidth data exchange. If the GPU is compared to a fighter jet in a smart chip, then the GDDR memory interactive access technology is the supporting high-speed runway. GDDR6/6X interface technology has become the standard configuration of flagship machines of various major manufacturers, which is one of the important reasons why the “Fenghua No. 1” GPU can lead the competition in the market!
“
(Author: Wu Jianghua of Innosilicon Technology)
foreword
Not long ago, the “Fenghua No. 1” high-performance GPU launched by Innosilicon attracted great attention from the market, with 160-320G Pixel/s rendering, 5-10T Flops floating-point computing power, and AI computing power up to 50Tops.
For a high-performance GPU, computing power is the core indicator, and the most basic technology to achieve computing power is high-bandwidth data exchange. If the GPU is compared to a fighter jet in a smart chip, then the GDDR memory interactive access technology is the supporting high-speed runway. GDDR6/6X interface technology has become the standard configuration of flagship machines of various major manufacturers, which is one of the important reasons why the “Fenghua No. 1” GPU can lead the competition in the market!
▲GDDR6/6X Combo IP is the core technology of Fenghua GPU high bandwidth
Today, let’s discuss in depth the importance and development of GDDR memory technology in SoCs, and how the GDDR6/6X interface can meet the needs of high-power GPUs for high-bandwidth data exchange.
The relationship between GDDR and GPU
Speaking of GDDR, we must first talk about the development history of GPU. Graphics processing unit (GPU), as a co-processor accelerator card of CPU, mainly accelerated applications such as games, videos, and images in the early days.
Image calculation includes a large number of mathematical operations such as vertex shading, screen mapping, fragment shading, clipping, and triangle traversal. GPU has inherent advantages over CPU in large-scale and concurrent computing.
▲ The algorithm structure mainly implemented by GPU
GPU has hundreds of computing cores based on graphics processing architecture, and has huge advantages in high-performance computing, parallel computing, and matrix operations. Therefore, driven by demand, GPU has naturally become the current artificial intelligence and high-performance computing. hardware core platform.
▲ Compared with CPU, the multi-core computing architecture of GPU is conducive to parallel matrix operations
Since the GPU architecture has hundreds or thousands of computing cores, the pipeline data flow of parallel computing does not apply to the traditional CPU+DDR data access model, and the corresponding GDDR technology comes into being.
GDDR memory technology is standard for mainstream advanced GPUs
The rapid development of advanced process semiconductors has illuminated a vast number of new applications such as artificial intelligence, autonomous driving, neural networks, and high-performance games. And GPU, as a high-performance, high-concurrency basic computing platform, makes Master Huang and Ma Su the most dazzling double stars on this stage!
▲ NVIDIA’s RTX, titan series and AMD YES are constantly bombarding the ceiling of everyone’s imagination!
When all high-performance GPUs are pursuing the ultimate computing power, memory data exchange has gradually become the bottleneck of the entire SoC, and the high-bandwidth and high-speed memory exchange technology has become the key point to improve GPU computing efficiency.
▲ GDDR is the core data exchange technology of the GPU system
Stimulated and driven by demand, GDDR technology has developed rapidly, DDR5 6.4Gbps/pin has not been rolled out on a large scale, and GDDR has already iterated to GDDR6X 21Gbps/pin rate. Generally, the bit width of DDR5 is 32~64bit, and the bandwidth of a single chip is 72Gbps, while the bandwidth of GDDR6 reaches an astonishing 512Gbps.
▲ The development of GDDR has rapidly surpassed that of DDR
A good horse is paired with a good saddle. GDDR is one of the most important technical links in the improvement of GPU computing power, paving the way for high-speed GPU high-performance engines.
▲ Ferrari can’t run in the mud!
Main advantages of GDDR
1. Comparison of GDDR and traditional DDR
§Bandwidth advantage
The conventional DDR series is 8 and 16-bit prefetching, and the array is 32~128bit, while GDDR5/6X is 16n prefetching, which realizes the large block content access of a single array of 256~512bit, and a single Access granularity of 32~64Byte, The system data width can reach 384bit to meet the high bandwidth requirement of GPU.
▲ GDDR5X/GDDR6 prefetch is larger
▲ The structure and rate of GDDR is conducive to a larger bus width
Since the particle array of GDDR is large, the width of the column address CA is smaller under the same density, as shown in the following figure:
▲ The column address of LPDDR4 is 10bit, while the column address of GDDDR6X is 6~7bit
The above technical characteristics show that the memory unit of GDDR is larger, the read length is large, and the data bus is wide, which presents distinct characteristics different from traditional DDR.
§ Pin comparison
GDDR5~GDDR6X use pins 170~180 pins, while traditional LPDDR4 requires 200 pins. Of course, compared with DDR3 80~90 pins, there is still a significant increase, but the bandwidth gain is greater.
GDDR and DDR have their own strengths and weaknesses.
In terms of bandwidth, core speed, and fewer pins, GDDR has great advantages in applications such as GPU, NPU, AI, and other high-concurrency computing. DDR still has more advantages in combination with CPU in random access, small burst read and write latency, and high-density memory particle applications.
2. The latest existing particle index performance
The GPU is developing rapidly, and various flagship machines are emerging one after another. At the same time, the progress of GDDR is not inferior, and even has a great momentum.
▲ Comparison of Micron’s GDDR particles in GPU flagships
Micron’s main memory particles are used with each flagship GPU. For applications with ultra-large bandwidth, Micron has made comparisons in three dimensions.
▲ Mainstream GDDR performance comparison
GDDR6X has reached a rate of 21Gbps/pin and a bandwidth of 1TB/S. Major GPU manufacturers have said, “With such a large bandwidth, how much computing power do I need to use GDDR particles with such a large bandwidth?” How bold is GDDR, GPU How prolific is there!
In September 2020, Micron announced the launch of ultra-bandwidth solution products based on GDDR6X memory chips, which Nvidia first carried in the high-performance flagship cards GeForce RTX 3090 and GeForce RTX 3080 GPUs.
The combination of GDDR6X and NVIDIA’s GeForce RTX series GPUs leads the way in state-of-the-art graphics processing equipment, draining our imaginations and gamers’ wallets!
▲ GeForce RTX 3080 Ti + GDDR6X 12GB memory particles
▲ The eye-catching 32GB GDDR6X is calling for players to hurry up and make money!
Innosilicon takes the lead in launching commercial GDDR6/6X combo IP
Provide acceleration services for global smart chips
The importance of GDDR technology as the basis of data exchange for the development of smart chips is self-evident. The demand for products such as autonomous driving, artificial intelligence, and game engines has shown a blowout growth, and the GDDR6/6X high-bandwidth interface technology supporting it is too complex and advanced in technology, so there are not many choices in the commercial IP market.
Therefore, the development of GDDR6/6x memory technology needs the joint promotion of particle manufacturers, IP technology companies and smart chip companies.
In 2021, Micron and Innosil will jointly develop and launch the first silicon-proven GDDR6/6X Combo IP, providing more chip companies with the high-bandwidth core technology of GDDR6/6X!
Micron even said: This IP has changed the landscape of artificial intelligence!
Innosilicon’s GDDR6/6X PHY and Controller IP are based on 14nm process and apply PAM4 signal technology, with a single pin rate of up to 21 Gbps, 256-bit width, and a system bandwidth of over 5Tb/s, meeting many high-bandwidth popular applications, such as image processing , game engines, signal analysis and artificial intelligence, etc.
▲ Mass production of the world’s first commercial GDDR6/6X Combo IP
▲ 21Gbps GDDR6X PAM4 DQ eye diagram
▲ GDDR6 WCK eye diagram 15GHZ
▲ GDDR6 DQ eye diagram 5Gbps
Innosilicon has also become an IP manufacturer to achieve full coverage from GDDR5 to GDDR6X, and the node GDDR6X has become the first commercial mass-produced IP, providing important interface technology for a wide range of high-performance chip companies around the world!
Interpretation of GDDR6/6X Combo IP Technology
・PAM4 signal technology
▲ PAM4 signal technology framework – there are 4 phases – a single cycle sends 2bit information
▲ QDR technology realizes the acquisition of 4 signals per clock, which meets the signal rate requirements of PAM4
・GDDR6 and GDDR6X architecture diagram comparison
▲ Structure comparison between GDDR6 and GDDR6X (please pay attention to the frequency multiplication relationship between clock and data sampling)
The biggest difference between GDDR6X and GDDR6 is that the data channel uses PAM4 technology to achieve 4 times the sampling rate and achieve a single-ended speed of 21Gbps.
▲ The multiplication relationship between GDDR6X-clock frequency and PAM4
Main technical difficulties
・Signal of PAM4 requires high-speed sampling rate
▲ Low voltage brings power consumption advantages, but imposes strict requirements on signals
In order to meet the high bandwidth requirements, GDDR6X sets the core frequency to 2.5GHZ, compared with the traditional DDR5 (400~800MHZ core frequency), in order to achieve the prefetched data sampling requirements, convert 2.5G x 16 prefetch ÷ 2 (PAM4) = 20Gbps , so the I/O rate must be greater than 20Gbps to complete the sampling.
The VDDQ voltage of GDDR6/6X is 1.25/1.35v, and the speed is 16~21Gbps high-speed signal, which puts forward extremely critical requirements for the design, wiring and packaging of internal cache, IO (125~135pins), and any tiny noise is passed through. After the attenuation path, the signal eye diagram cannot be opened.
▲ DQ eye diagram comparison of GDDR6 (8Gbps) and GDDR6x PAM4 (16Gbps)
・Requirements of ultra-low voltage for advanced wafer process
▲ FinFet process has extremely high requirements for IP design
GDDR6/6X IP has high speed and low voltage amplitude. Advanced FinFet process must be used. The verification cost of advanced process is high, and a single tape-out costs 2-3 million US dollars. The design convergence rules are complex, and the test equipment and cost are high. experience is very demanding.
Innosilicon provides a complete set of technical packaging solutions
In addition to the GDDR6/6X Combo PHY+Controller itself, design companies are still faced with complex wiring, packaging and other issues, and every technical point is at risk before mass production. For this, Innosilicon provides a one-stop packaged solution.
Innosilicon provides IP-supported IO routing, package design, PCB board-level reference, signal integrity analysis, etc., which greatly reduces the user’s risk and integration time, and truly deploys the world’s leading GDDR6/6X technology into SoCs in one stop. , to achieve ultra-high bandwidth memory access.
▲ PCB trace reference scheme
▲ Signal integrity analysis – return loss and insertion loss
Epilogue
Innosilicon has a lot of experience in mass production and verification of advanced process IP, from DDDR5/4/3/2 to LPDDR5/4/3/2, as well as leading GDDR5/5X, GDDR6/6X, HBM3, Innolink Chiplet, 32/ 56G Serders, etc., Innosil has taken the lead in investing huge R&D strength for mass production verification, providing high-speed interface solutions for a wide range of high-performance SoCs, and providing acceleration services for high-performance chips around the world!
▲ HBM3 6.4Gbps high-speed eye diagram
▲ Mass production of the world’s first GDDR6/6X Combo IP
▲32/56G SerDes eye diagram (supports high-speed protocols such as PCIE5/SATA/USB3.0/SGMII/MIPI)
▲ Fenghua No. 1 applies advanced interface IP such as Innolink Chiplet, GDDR6/6X
These advanced IPs are interdependent and interrelated at the technical level. Each of them is a unique leading technology in the market. What’s more valuable is that the above physical pictures are not PPT products. They are the 16-year-old Innovo team in CEO Ao Ao. Under the leadership of Mr. Hai, the harvest of continuous investment, focus on research and development, and long-term cultivation is particularly valuable in the current environment of impetuous capital speculation.
▲ Mr. Ao Hai, CEO of Innosilicon
Innosilicon’s advanced IP technology, on the one hand, leads the industry’s technological innovation and shapes the long-term global development vision of semiconductor companies;
▲ Rich application scenarios
Innosilicon has invested heavily in global advanced technology for 16 years, focusing on high-end IP research and development, and has built core advantages in high-performance computing platforms, multimedia terminals & automotive electronics platforms, IoT platforms and other application fields, with more than 200 tape-out records, more than 60 100 million authorized mass production chips and more than 1 billion high-end custom SoC mass production, silently working hard and down-to-earth, making important contributions to empowering high-end chips!
The Links: BSM100GB120DN2K SEMIX353GB126V1