SAGE: toward on-the-fly gradient compression ratio scaling

Yoon, Daegun; Jeong, Minjoong; Oh, Sangyoon

DC Field	Value	Language
dc.contributor.author	Yoon, Daegun	-
dc.contributor.author	Jeong, Minjoong	-
dc.contributor.author	Oh, Sangyoon	-
dc.date.issued	2023-07-01	-
dc.identifier.uri	https://dspace.ajou.ac.kr/dev/handle/2018.oak/33259	-
dc.description.abstract	Gradient sparsification is widely adopted in distributed training; however, it suffers from a trade-off between computation and communication. The prevalent Top-k sparsifier has a hard constraint on computational overhead while achieving the desired gradient compression ratio. Conversely, the hard-threshold sparsifier eliminates computational constraints but fail to achieve the targeted compression ratio. Motivated by this tradeoff, we designed a novel threshold-based sparsifier called SAGE, which achieves a compression ratio close to that of the Top-k sparsifier with negligible computational overhead. SAGE scales the compression ratio by deriving an adjustable threshold based on each iteration’s heuristics. Experimental results show that SAGE achieves a compression ratio closer to the desired ratio than a hard-threshold sparsifier without exacerbating the accuracy of model training. In terms of computation time for gradient selection, SAGE achieves a speedup of up to 23.62 × over the Top-k sparsifier.	-
dc.description.sponsorship	This work was jointly supported by the BK21 FOUR program (NRF5199991014091), the Basic Science Research Program (2022R1F1A1062779) of National Research Foundation (NRF) of Korea, the Korea Institute of Science and Technology Information (KISTI) (TS-2022-RE-0019), and (KSC-2022-CRE-0406).	-
dc.language.iso	eng	-
dc.publisher	Springer	-
dc.subject.mesh	Communication optimization	-
dc.subject.mesh	Compression ratio scaling	-
dc.subject.mesh	Computational constraints	-
dc.subject.mesh	Computational overheads	-
dc.subject.mesh	Distributed deep learning	-
dc.subject.mesh	Gradient sparsification	-
dc.subject.mesh	Hard constraints	-
dc.subject.mesh	Scalings	-
dc.subject.mesh	Sparsification	-
dc.subject.mesh	Trade off	-
dc.title	SAGE: toward on-the-fly gradient compression ratio scaling	-
dc.type	Article	-
dc.citation.endPage	11409	-
dc.citation.startPage	11387	-
dc.citation.title	Journal of Supercomputing	-
dc.citation.volume	79	-
dc.identifier.bibliographicCitation	Journal of Supercomputing, Vol.79, pp.11387-11409	-
dc.identifier.doi	10.1007/s11227-023-05120-7	-
dc.identifier.scopusid	2-s2.0-85148905546	-
dc.identifier.url	https://www.springer.com/journal/11227	-
dc.subject.keyword	Communication optimization	-
dc.subject.keyword	Compression ratio scaling	-
dc.subject.keyword	Distributed deep learning	-
dc.subject.keyword	Gradient sparsification	-
dc.description.isoa	false	-
dc.subject.subarea	Theoretical Computer Science	-
dc.subject.subarea	Software	-
dc.subject.subarea	Information Systems	-
dc.subject.subarea	Hardware and Architecture	-

Show simple item record

qrcode

트윗하기

Related Researcher

Oh, Sangyoon오상윤: Department of Software and Computer Engineering

File Download

There are no files associated with this item.

Related Researcher

Total Views & Downloads

File Download