Deep Learning Training and the Advances of Pipeline Model Parallelism: A Short Review

Le Guan; Sheng Li; Ji Liang

Deep Learning Training and the Advances of Pipeline Model Parallelism: A Short Review

Abstract

This paper provides an overview of pipeline model parallelism for deep learning training. Pipeline model parallelism is a technique that divides a deep learning model into multiple parts and trains them separately on different devices. This approach can significantly improve the efficiency of deep learning training on large-scale distributed systems. The paper discusses the challenges and advantages of pipeline model parallelism, and provides an overview of recent advancements in this area.

Keywords

Pipeline model parallelismdeep learning trainingdistributed systemsefficiencychallengesadvantagesadvancements

References

Ali Riahi, Abdorreza Savadi, Mahmoud Naghibzadeh. Many-BSP: an analytical performance model for CUDA kernels[J]. Computing,2024,106(5):1519-1555.
Benwei Hou, Qianyi Xu, Zilan Zhong, et al.Seismic reliability evaluation of spatially correlated pipeline networks by quasi-Monte Carlo simulation[J]. Structure and Infrastructure Engineering,2024,20(4):498-513.
Huang B., Huang X., Yin Y., et al. Adaptive partitioning and efficient scheduling for distributed DNN training in heterogeneous IoT environment[J]. Computer communications, 2024, 215(Feb.):169-179.
Alibaba Group Holding Limited. Model Processing Method and Apparatus, Device, And Computer-Readable Storage Medium: US18024901[P]. 2023-10-05.
Alibaba Group Holding Limited. Model Processing Method and Apparatus, Device, And Computer-Readable Storage Medium: EP21866025.6[P]. 2023-07-19.
Microsoft Technology Licensing, Llc. Stash Balancing in Model Parallelism: Ep21735028. 9[P]. 2023-06-07.
Zhejiang Lab. Graph Execution Pipeline Parallelism Method and Apparatus For Neural Network Model Computation: Cncn2022/092481[P]. 2023-05-19.
Zhejiang Lab. Graph Execution Method and Apparatus For Neural Network Model Computation: Cncn2022/086575[P]. 2023-05-19.
Hao Fu, Peng Li, Xiaopeng Fu, et al. Compact Real-time Simulator with Spatial-temporal Parallel Design for Large-scale Wind Farms[J]. Journal of Electric Power and Energy Systems, Chinese Society of Electrical Engineering,2023,9(1):50-65.
Luya Wang, Yanjie Dong, Lei Zhang, et al.Wireless Model Splitting for Communication-Efficient Personalized Federated Learning with Pipeline Parallelism[C]. //2023 IEEE 24th International Workshop on Signal Processing Advances in Wireless Communications: SPAWC 2023, Shanghai, China, 25-28 September 2023. 2023:421-425.
An Xu, Yang Bai. Cross Model Parallelism for Faster Bidirectional Training of Large Convolutional Neural Networks[C]. //Machine Learning and Knowledge Discovery in Databases: Research Track: European Conference, ECML PKDD 2023, Turin, Italy, September 18.