[1]周 嘉,涂 军,任冬淋. 基于Gossip的异步分布式训练算法[J].湖北工业大学学报,2023,(1):43-47+58.
 ZHOU Jia,TU Jun,REN Donglin. Asynchronous Distributed Training Algorithm based on Gossip[J].,2023,(1):43-47+58.
点击复制

 基于Gossip的异步分布式训练算法()
分享到:

《湖北工业大学学报》[ISSN:1003-4684/CN:42-1752/Z]

卷:
期数:
2023年第1期
页码:
43-47+58
栏目:
湖北工业大学学报
出版日期:
2023-03-13

文章信息/Info

Title:
 Asynchronous Distributed Training Algorithm based on Gossip
文章编号:
1003-4684(2023)01-0043-04
作者:
周 嘉涂 军任冬淋
 湖北工业大学计算机学院,湖北武汉430068)
Author(s):
 ZHOU Jia TU Jun REN Donglin
 School of Computer Science, Hubei Univ. of Tech., Wuhan 430068, China
关键词:
 非中心化分布式Gossip异步
Keywords:
 distributed Decentralization Gossip asynchronous
分类号:
TP399
文献标志码:
A
摘要:
因此基于Gossip协议并结合SGD(Stochastic Gradient Descent)提出了一种用于深度学习的通信框架GRSGD(Gossip Ring SGD),该通信框架是非中心化且异步的,解决了通信等待时间较长的问题。实验使用ImageNet数据集,ResNet模型验证了该算法的可行并与Ring AllReduce和DPSGD(Decentralized parallel SGD)进行了比较,GRSGD在更短的时间内完成了训练。

参考文献/References:

[1] CHANDRASEKARAN V, RECHT B, PARRILO P A, et al. The convex geometry of linear inverse problem [J]. Foundations of Computational Mathematics 2012,12:805-849.
[2] LI M, ANDERSEN D G, PARK J W, et al. Scaling distributed machine learning with the parameter server[C].∥11th {USENIX} Symposium on Operating Systems Design and Implementation 2014,({OSDI} 14):583-598.
[3] LI M, ANDERSEN D G, SMOLA A J, et al. Communication efficient distributed machine learning with the parameter server[J]. Advances in Neural Information Processing Systems, 2014,27: 19-27.
[4] XING E P, HO Q, DAI W, et al. Petuum: A new platform for distributed machine learning on big data[J]. IEEE Transactions on Big Data, 2015,1(2): 49-67.
[5] MARTN ABADI, PAUL BARHAM, JIANMIN CHEN,et al. Tensorflow: a system for large-scale machine learning[C]. In Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI),2016,265-283.
[6] Seide F, Agarwal A. CNTK: Microsoft’s open-source deep-learning toolkit[C].∥Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016: 2135-2135.
[7] TIANQI CHEN, MU LI, YUTIAN LI, et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems[C]. In Proceedings of NIPS Workshop on Machine Learning Systems, 2016.
[8] ALEXANDER SERGEEV, MIKE DEL BALSO. Horovod: fast and easy distributed deep learning in TensorFlow[EB/OL]. 2018, https:∥arxiv.org/abs/1802.05799v3.
[9] ZHANG Lizhi, RAN Zhejiang, LAI Zhi-quan, et al. Performance analysis of distributed deep learning communication architecture[C]. In Computer Engineer & Science,2020.
[10] Oguni H, Shudo K. Communication scheduling for gossip sgd in a wide area network[J]. IEEE Access, 2021.
[11] HAN R, LI S, WANG X, et al. Accelerating gossip-based deep learning in heterogeneous edge computing platforms[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 32(07): 1591-1602.
[12] BLOT M, PICARD D, THOME N,et al. Cord, Distributed optimization for deep learning with gossip exchange[J], Neurocomputing, 2019,330: 287-296.
[13] BLOT M, PICARD D, CORD M,et al. Thome, Gossip training for deep learning[EB/OL]. (2021-01-02).[2016-11-29].https:∥arxiv.org/abs/1611.09726.
[14] LU Y , SA C D . Optimal Complexity in Decentralized Training[C].∥ International Conference on Machine Learning. PMLR, 2021. [15] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, Randomized gossip algorithms[J], IEEE Trans. Inf. Theory, 2006, 2508–2530.
[16] I COLIN, A BELLET, J SALMON, et al. Gossip dual averaging for decentralized optimization of pairwise functions[C]. in Proc. 33rd Int. Conf. Int. Conf. Mach. Learn., 2016,1388-1396.
[17] LIUIU JI, CE ZHANG HANGCE. Distributed learning systems with first-order methods[EB/OL]. (2021-01-02).[2021-04-21]. https://arxiv.org/abs/2104.05245.
[18] LIAN X, ZHANG C, ZHANG H, et al. Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent[C]. In Advances in Neural Information Processing Systems, 2017,5330-5340.
[19] YUAN D, XU S, ZHAO H, et al. Distributed dual averaging method for multi-agent optimization with quantized communication[J]. Systems & Control Letters, 2012, 61(11):1053-1061.
[20] XIAO L, BOYD S,LALLa S. A scheme for robust distributed sensor fusion based on average consensus[C]. In IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005, 63-70.
[21] THANOU D, KOKIOPOULOU E, PU Y, et al. Distributed average consensus with quantization refinement[J]. IEEE Transactions on Signal Processing, 2013,61(1):194–205.
[22] KOLOSKOVA A, STICH S U, JAGGI M. Decentralized stochastic optimization and gossip algorithms with compressed communication:, 10.48550/arXiv.1902.00340[P]. 2019.
[23] HAISHAN Y E, LUO LUO, ZIANG ZHOU, AND TONG ZHANG. Multi-consensus decentralized accelerated gradient descent[EB/OL]. (2020). https:∥arxiv.org/abs/2005.00797.
[24] KEMPE D, DOBRA A, GEHRKE J. Gossip-based computation of aggregate information[C].In: Proceedings of the Forty-Fourth Annual IEEE Symposium on Foundations of Computer Science,IEEE, 2003, 482-491.
[25] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]. In Proceedings of the IEEE conference on computer vision and pattern recognition,2016, 770-778.

相似文献/References:

[1]熊韧,曹海印,王焱清,等.非牛顿润滑静压轴承的节流器流量方程修正[J].湖北工业大学学报,2019,34(5):6.
 XIONG Ren,CAO Haiyin,WANG Yanqing,et al.Modified restrictor flow equations of hydrostatic bearings ubricated by non-Newtonian fluids[J].,2019,34(1):6.
[2]王照远,曹 民,王 毅,等. 场景与数据双驱动的隧道图像拼接方法[J].湖北工业大学学报,2020,(4):11.
 WANG Zhaoyuan,CAO Min,WANG Yi,et al. Tunnel Image Stitching Method based on Scene and Data[J].,2020,(1):11.
[3]潘 健,梁佳成,陈凤娇,等. 单电流闭环多重PR控制的LCL型逆变器[J].湖北工业大学学报,2020,(4):16.
 PAN Jian,LIANG Jiacheng,CHEN Fengjiao,et al. Design of LCL Grid Connected Inverter based on Single Closed Loop Control and Multiple PR Controllers[J].,2020,(1):16.
[4]王晓光,赵 萌,文益雪,等. 定子闭口槽结构对永磁电机齿槽转矩影响分析[J].湖北工业大学学报,2020,(4):25.
 WANG Xiaoguang,ZHAO Meng,WEN Yixue,et al. Study on Cogging Torque and Vibration Noise of Permanent Magnet Motor with Segmental Stator and Closed-Slot[J].,2020,(1):25.
[5]宇 卫,凃玲英,陈 健. 风电场集中接入对集电线电流保护的影响[J].湖北工业大学学报,2020,(4):29.
 YU Wei,TU Lingying,CHEN Jian. Effect of the Collective Line Current Protection when Wind Farms are Centralized Accessed to the Power System[J].,2020,(1):29.
[6]廖政斌,王泽飞,祝 珊. 二惯量系统谐振在线抑制及相位补偿[J].湖北工业大学学报,2020,(4):34.
 LIAO Zhengbin,WANG Zefei,ZHU Shan. Online Resonance Suppression and Phase Compensation for Double Inertia System[J].,2020,(1):34.
[7]王 欣,游 颖,姜天翔,等. 面向3D打印过程的产品工艺设计和优化[J].湖北工业大学学报,2020,(4):39.
 WANG Xin,YOU Ying,JIANG Tianxiang,et al. Product Process Design and Optimization for 3D Printing Processes[J].,2020,(1):39.
[8]冉晶晶,文 红,罗雅梅,等. 全自动样品前处理平台及其控制系统[J].湖北工业大学学报,2020,(4):43.
 RAN Jingjing,WEN Hong,LUO Yamei,et al. Research on Automatic Sample Preprocessing Platform and its Control System[J].,2020,(1):43.
[9]杨 磊,马志艳,石 敏,等. 基于模糊PID的小型冷库过热度控制方法[J].湖北工业大学学报,2020,(4):43.
 YANG Lei,MA Zhiyan,SHI Min,et al. Research on Superheat Control Method of Small Cold Storage based on Fuzzy PID[J].,2020,(1):43.
[10]黄 晶,周细枝,周业望. 动态注塑成型模具的设计与实验研究[J].湖北工业大学学报,2020,(4):52.
 HUANG Jing,ZHOU Xizhi,ZHOU Yewang. Design and Experimental Study of Dynamic Injection Molding[J].,2020,(1):52.

备注/Memo

备注/Memo:
[收稿日期] 2022-01-04
[第一作者] 周  嘉(1993-),男,湖北荆州人,湖北工业大学硕士研究生,研究方向为计算机网络
[通信作者] 涂  军(1970-),男,湖北武汉人,工学博士,湖北工业大学教授,研究方向为计算机网络,物联网
更新日期/Last Update: 2023-03-14