Ajou University repository

MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Trainingoa mark
Citations

SCOPUS

0

Citation Export

Publication Year
2023-01-01
Journal
Proceedings - 2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics, HiPC 2023
Publisher
Institute of Electrical and Electronics Engineers Inc.
Citation
Proceedings - 2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics, HiPC 2023, pp.87-96
Keyword
distributed deep learninggradient sparsificationscalability
Mesh Keyword
Communication optimizationComputational costsDistributed deep learningGradient sparsificationGradient vectorsNeural networks trainingsOptimization techniquesScalingsSparsificationWorkers'
All Science Classification Codes (ASJC)
Artificial IntelligenceComputer Networks and CommunicationsComputer Science ApplicationsHardware and ArchitectureInformation SystemsInformation Systems and Management
Abstract
Gradient sparsification is a communication optimisation technique for scaling and accelerating distributed deep neural network (DNN) training. It reduces the increasing communication traffic for gradient aggregation. However, existing sparsifiers have poor scalability because of the high computational cost of gradient selection and/or increase in communication traffic. In particular, an increase in communication traffic is caused by gradient build-up and inappropriate threshold for gradient selection. To address these challenges, we propose a novel gradient sparsification method called MiCRO. In MiCRO, the gradient vector is partitioned, and each partition is assigned to the corresponding worker. Each worker then selects gradients from its partition, and the aggregated gradients are free from gradient build-up. Moreover, MiCRO estimates the accurate threshold to maintain the communication traffic as per user requirement by minimising the compression ratio error. MiCRO enables near-zero cost gradient sparsification by solving existing problems that hinder the scalability and acceleration of distributed DNN training. In our extensive experiments, MiCRO outperformed state-of-the-art sparsifiers with an outstanding convergence rate.
Language
eng
URI
https://aurora.ajou.ac.kr/handle/2018.oak/36945
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85190595332&origin=inward
DOI
https://doi.org/10.1109/hipc58850.2023.00024
Journal URL
http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=10487008
Type
Conference
Funding
This work was jointly supported by the ITRC program (IITP-2023-2018-0-01431) of IITP, BK21 FOUR program (NRF5199991014091), and Basic Science Research Program (2021R1F1A1062779) of National Research Foundation of Korea.
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Oh, Sangyoon Image
Oh, Sangyoon오상윤
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.