Citation Export
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yoon, Daegun | - |
dc.contributor.author | Jeong, Minjoong | - |
dc.contributor.author | Oh, Sangyoon | - |
dc.date.issued | 2023-07-01 | - |
dc.identifier.uri | https://dspace.ajou.ac.kr/dev/handle/2018.oak/33259 | - |
dc.description.abstract | Gradient sparsification is widely adopted in distributed training; however, it suffers from a trade-off between computation and communication. The prevalent Top-k sparsifier has a hard constraint on computational overhead while achieving the desired gradient compression ratio. Conversely, the hard-threshold sparsifier eliminates computational constraints but fail to achieve the targeted compression ratio. Motivated by this tradeoff, we designed a novel threshold-based sparsifier called SAGE, which achieves a compression ratio close to that of the Top-k sparsifier with negligible computational overhead. SAGE scales the compression ratio by deriving an adjustable threshold based on each iteration’s heuristics. Experimental results show that SAGE achieves a compression ratio closer to the desired ratio than a hard-threshold sparsifier without exacerbating the accuracy of model training. In terms of computation time for gradient selection, SAGE achieves a speedup of up to 23.62 × over the Top-k sparsifier. | - |
dc.description.sponsorship | This work was jointly supported by the BK21 FOUR program (NRF5199991014091), the Basic Science Research Program (2022R1F1A1062779) of National Research Foundation (NRF) of Korea, the Korea Institute of Science and Technology Information (KISTI) (TS-2022-RE-0019), and (KSC-2022-CRE-0406). | - |
dc.language.iso | eng | - |
dc.publisher | Springer | - |
dc.subject.mesh | Communication optimization | - |
dc.subject.mesh | Compression ratio scaling | - |
dc.subject.mesh | Computational constraints | - |
dc.subject.mesh | Computational overheads | - |
dc.subject.mesh | Distributed deep learning | - |
dc.subject.mesh | Gradient sparsification | - |
dc.subject.mesh | Hard constraints | - |
dc.subject.mesh | Scalings | - |
dc.subject.mesh | Sparsification | - |
dc.subject.mesh | Trade off | - |
dc.title | SAGE: toward on-the-fly gradient compression ratio scaling | - |
dc.type | Article | - |
dc.citation.endPage | 11409 | - |
dc.citation.startPage | 11387 | - |
dc.citation.title | Journal of Supercomputing | - |
dc.citation.volume | 79 | - |
dc.identifier.bibliographicCitation | Journal of Supercomputing, Vol.79, pp.11387-11409 | - |
dc.identifier.doi | 10.1007/s11227-023-05120-7 | - |
dc.identifier.scopusid | 2-s2.0-85148905546 | - |
dc.identifier.url | https://www.springer.com/journal/11227 | - |
dc.subject.keyword | Communication optimization | - |
dc.subject.keyword | Compression ratio scaling | - |
dc.subject.keyword | Distributed deep learning | - |
dc.subject.keyword | Gradient sparsification | - |
dc.description.isoa | false | - |
dc.subject.subarea | Theoretical Computer Science | - |
dc.subject.subarea | Software | - |
dc.subject.subarea | Information Systems | - |
dc.subject.subarea | Hardware and Architecture | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.