ZhiCun Lecture & Peking University | AI big models bring changes to Computility Base

Beyond Moreer
0 replies
On the evening of April 28, 2023, the ninth "ZhiCun Lecture" of the School of Information Science and Technology, as well as the Frontier of Information Science and Technology and Industrial Innovation Course, was successfully held in Room 106 of the Science Teaching Building. Mr. Wang Shaodi, founder and CEO of ZhiCun Technology and Peking University alumnus, was invited to give a talk on the theme of "The Impact of AI Large Models on Computing Infrastructure". More than 30 teachers and students attended the event. The event was hosted by Professor Wang Runsheng, vice dean of the School of Information Science and Technology at Peking University. At the beginning of the lecture, Mr. Wang Shaodi briefly introduced the current situation and the situation of ZhiCun Technology. He stated that AI large models have reached the singularity and will not only generate huge economic benefits, bring significant changes to people's lives, but also have a great impact on the entire industry from application scenarios to underlying computing power, and create new strategic pivots for international competition. As the founder and CEO of ZhiCun Technology, Mr. Wang Shaodi is particularly concerned about the impact of AI large models on underlying computing power and its infrastructure. Since its establishment, ZhiCun Technology has been deeply involved in the field of AI computing power, and has been in a leading position in the field of computing in memory chips, achieving many breakthroughs from scratch. The company has a professional team of more than 180 people and has completed nearly 800 million RMB in financing. In the future, as AI large models continue to develop, the company will continue to focus on computing in memory technology, and strive to improve the underlying computing power of AI. Next, Mr. Wang Shaodi introduced the impact of AI large models on computing infrastructure from four aspects: the application scenarios of AI large models, the demand for AI computing large models, near-storage computing technology and its applications, and in-storage computing and its large model applications. 01 Application Scenarios of AI Large Models Currently, AI large models have evolved from a mere conceptual innovation to an advanced technology that can bring huge commercial value and leap in productivity, and have a wide range of application scenarios, exhibiting strong capabilities in image and video analysis, code generation, data analysis, video generation, etc. It is expected that within the next two to three years, GPT like algorithms will have even greater iterations, which will better save production costs and improve production efficiency. Of course, the application of large models will also bring a series of problems such as data security threats, and relevant regulatory governance measures still need to be further studied and explored. 02 AI Computing Power Demand Currently, the development of computing power is subject to various constraints. On the one hand, Moore's Law is approaching its limitation, and the growth of silicon-based computing power will gradually reach its limit. On the other hand, the computing performance of memory and the bandwidth of storage are improving slowly and cannot keep up with the increase in clock speed, core number, and storage capacity of computing chips, which restricts the growth of computing efficiency and increases the power consumption of data reading and writing. Power consumption is the bottleneck that limits computing power. Under certain heat dissipation technology, there is an upper limit to the power consumption of chips, and the corresponding upper limit of computing power exists. In order to improve computing power, it is necessary to improve energy efficiency, expand the "gate" of data penetration, and make data flow more easily between the two sides of the "gate". From intelligent speech and visual processing to autonomous driving and AIGC, large models require increasingly high parameter and computing power. The higher the computing power, the more data needs to be transported. In fact, in what is currently called computing, 90% of the time is spent on data transportation. In the traditional Von Neumann computing architecture, data needs to flow between different storage units, forming a huge data flow. Under this computing system, the hard disk has a large capacity but is slow to read, and the cache is fast to read but has a small capacity, neither of which can truly help AI perform effective calculations. The existing AI computing relies most on memory, which has a large enough capacity to accommodate most or part of the model, and has a high enough bandwidth to support high-speed data read and write requirements. Currently, the industry's optimization is mainly focused on optimizing the bandwidth between memory and computing chips. In addition, since the capacity of memory on a single chip is limited, some large models require multiple chips to be interconnected to achieve training or inference. In this process, the real bottleneck that constitutes computing and urgently needs improvement is not the computing speed of a single core of the computing chip, but the speed of data exchange between the memory and the chip on a single card and between multiple cards. Improvements at the architectural level can also bring about improvements in computing efficiency. Compared to CPUs, GPUs are already about 100 times more efficient in computing AI models. For large models with large parameter calculation requirements, the efficiency of general-purpose computing is very low. If a specialized computing solution is designed specifically for the model, it will bring greater benefits. By developing specialized computing chips with dedicated computing architecture through customized design, it is expected that the efficiency of large model computing can be improved by about 10 times. The AI computing power market can’t to be underestimated. Based on current graphics card and cloud service prices, the short-term market size is expected to reach 1.5 billion US dollars, while the medium-term and long-term market sizes are expected to reach 120 billion US dollars and 2 trillion US dollars, respectively. Based on cost optimization of future solutions and cost reduction of computing power, a more reasonable prediction for the future market is around 30 billion US dollars in the medium-term and 100 billion US dollars in the long-term. It can be seen that the AI underlying computing power market will be a very large and specialized market. 03 Near-Memory Computing Technology and Applications How to continuously optimize computing power, reduce computing costs, and improve computing efficiency?Computing in memory technology has been considered one of the most efficiency method.In the Von Neumann computing architecture, memory and computing are separated, but now mainstream high-performance chips have begun to adopt near-memory computing architecture, integrating memory and computing chips together. The more efficient near-memory computing architecture and the combination of near-memory and in-memory computing have great potential for development. Near-memory computing is currently the most commercialized integrated storage-computing technology. Near-memory computing originated from the practice of AMD and Hynix in 2013 to solve the data bandwidth problem between memory and graphics cards. The basic idea is to integrate memory and computing chips together in an integrated circuit manner, shorten the distance between them, and make the wire length shorter, denser, and more numerous. Near-memory computing has two integrated technologies: 2.5D and 3D. The 2.5D technology integrates the computing chip and memory chip onto one chip, thereby achieving chip process wiring between the two to replace the original PCB process wiring. This technology is currently the most practical solution that uses existing each mature technologies to solve the problem. The 3D integrated technology "glues" two chips together, and the wire density between them can be increased by another 10 to 100 times, with higher integration density and faster computing speed. Major manufacturers actively adopt near-memory computing technology and launch advanced performance products. Currently, Nvidia's high-bandwidth memory technology (HBM), which adopts a combination of 2.5D and 3D packaging, has been iterated several times to achieve high bandwidth and has been adopted for Apple's M1 and M2 chips. Intel's Zhiqiang processor uses 2.5D integrated HBM2E memory, integrating AI training and inference accelerators, which can more effectively improve the speed of related calculations. AMD has significantly reduced the transmission power consumption of each bit of data by stacking all memory directly with the computing chip through 3D. Samsung integrates computing logic chips and memory based on 3D packaging technology. However, as Moore's Law gradually becomes ineffective, the cost of chips is getting higher and higher. Currently, improving the speed by 10% to 20% requires an increase in cost of about 50%, which is equivalent to doubling the speed and increasing the cost by nearly five times. In the future, the cost of obtaining higher computing power will continue to rise, and the price of products will correspondingly increase. Moreover, the integrated ways of 2.5D and 3D are expected to reach the bandwidth limit in the next two to three years, and the challenges of future development will become increasingly greater. 04 In-Memory Computing and its Applications in Large AI Models As mentioned above, Near-Memory Computing (NMC) is currently the mainstream approach to reduce the distance between memory and computing chips through packaging and integration. However, there are still limitations to its development. In-Memory Computing (IMC), as a more efficient computing technology, is gaining more and more attention. The reason for using memory instead of hard drives for AI computation is that hard drives have large capacities but unsatisfactory read/write speeds. If the storage units could directly perform computations based on their physical characteristics, it would reduce data flow and decrease the dependence on high bandwidth. IMC technology can be divided into three generations: IMC SoC, 3D IMC , and 2.5D+3D IMC, which have rich content. Currently, IMC technology can enable storage units to perform multiplication and addition calculations, covering about 90% of AI computation, which significantly improves overall computing efficiency. In addition, large AI models have relatively lower requirements for computational accuracy, and IMC precision can complete relevant tasks. Furthermore, the distributed storage and computing characteristics of IMC make it suitable for implementing hybrid expert systems. Cost-effectiveness is a key feature driving the development of underlying computing power. Since IMC technology is less dependent on advanced processes and has low memory bandwidth requirements, it is expected to significantly reduce AI computing costs and demonstrate good development prospects.
🤔
No comments yet be the first to help