Ajou University repository

Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learningoa mark
Citations

SCOPUS

0

Citation Export

DC Field Value Language
dc.contributor.authorYoon, Daegun-
dc.contributor.authorOh, Sangyoon-
dc.date.issued2024-01-01-
dc.identifier.urihttps://aurora.ajou.ac.kr/handle/2018.oak/37120-
dc.identifier.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85207966978&origin=inward-
dc.description.abstractCommunication overhead is a major obstacle to scaling distributed training systems. Gradient sparsification is a potential optimization approach to reduce the communication volume without significant loss of model fidelity. However, existing gradient sparsification methods have low scalability owing to inefficient design of their algorithms, which raises the communication overhead significantly. In particular, gradient build-up and inadequate sparsity control methods degrade the sparsification performance considerably. Moreover, communication traffic increases drastically owing to workload imbalance of gradient selection between workers.To address these challenges, we propose a novel gradient sparsification scheme called ExDyna. In ExDyna, the gradient tensor of the model comprises fined-grained blocks, and contiguous blocks are grouped into non-overlapping partitions. Each worker selects gradients in its exclusively allocated partition so that gradient build-up never occurs. To balance the workload of gradient selection between workers, ExDyna adjusts the topology of partitions by comparing the workloads of adjacent partitions. In addition, ExDyna supports online threshold scaling, which estimates the accurate threshold of gradient selection on-the-fly. Accordingly, ExDyna can satisfy the user-required sparsity level during a training period regardless of models and datasets. Therefore, ExDyna can enhance the scalability of distributed training systems by preserving near-optimal gradient sparsification cost. In experiments, ExDyna outperformed state-of-the-art sparsifiers in terms of training speed and sparsification performance while achieving high accuracy.-
dc.language.isoeng-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.subject.meshCommunication overheads-
dc.subject.meshDistributed deep learning-
dc.subject.meshDistributed training systems-
dc.subject.meshGradient sparsification-
dc.subject.meshNear-optimal-
dc.subject.meshOptimization approach-
dc.subject.meshPerformance-
dc.subject.meshScalings-
dc.subject.meshSparsification-
dc.subject.meshWorkers'-
dc.titlePreserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning-
dc.typeConference-
dc.citation.conferenceDate2024.5.6. ~ 2024.5.9.-
dc.citation.conferenceName24th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2024-
dc.citation.editionProceedings - 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2024-
dc.citation.endPage316-
dc.citation.startPage307-
dc.citation.titleProceedings - 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2024-
dc.identifier.bibliographicCitationProceedings - 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2024, pp.307-316-
dc.identifier.doi10.1109/ccgrid59990.2024.00043-
dc.identifier.scopusid2-s2.0-85207966978-
dc.identifier.urlhttp://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=10701311-
dc.subject.keyworddistributed deep learning-
dc.subject.keywordgradient sparsification-
dc.subject.keywordscalability-
dc.type.otherConference Paper-
dc.description.isoatrue-
dc.subject.subareaComputer Networks and Communications-
dc.subject.subareaHardware and Architecture-
dc.subject.subareaInformation Systems and Management-
dc.subject.subareaSafety, Risk, Reliability and Quality-
Show simple item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Oh, Sangyoon Image
Oh, Sangyoon오상윤
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.