SCOPUS
0Citation Export
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.advisor | Sangyoon Oh | - |
| dc.contributor.author | 윤대건 | - |
| dc.date.issued | 2024-02 | - |
| dc.identifier.other | 33298 | - |
| dc.identifier.uri | https://aurora.ajou.ac.kr/handle/2018.oak/39216 | - |
| dc.description | 학위논문(박사)--인공지능학과,2024. 2 | - |
| dc.description.abstract | Communication overhead is a major obstacle to scaling distributed training systems. Gradient sparsification is a potential optimization approach to reduce the communication volume without significant loss of model fidelity. However, existing gradient sparsification methods have low scalability owing to inefficient design of their algorithms, which raises the communication overhead significantly. In particular, gradient build-up and inadequate sparsity control methods degrade the sparsification performance considerably. Moreover, communication traffic increases drastically owing to workload imbalance of gradient selection between workers._x000D_ <br>_x000D_ <br>In this paper, we propose ExDyna to address above challenges. In ExDyna, the gradient tensor of the model comprises fined-grained blocks, and contiguous blocks are grouped into non-overlapping partitions. Each worker selects gradients in its exclusively allocated partition so that gradient build-up never occurs. To balance the workload of gradient selection between workers, ExDyna adjusts the topology of partitions by comparing the workloads of adjacent partitions. In addition, ExDyna supports online threshold scaling, which estimates the accurate threshold of gradient selection on-the-fly. Accordingly, ExDyna can satisfy the user-required sparsity level during a training period regardless of models and datasets. Therefore, ExDyna can enhance the scalability of distributed training systems by preserving near-optimal gradient sparsification cost. In experiments, ExDyna outperformed state-of-the-art sparsifiers in terms of training speed and sparsification performance while achieving high accuracy. | - |
| dc.description.tableofcontents | 1 Introduction 1_x000D_ <br>2 Preliminaries 9_x000D_ <br>3 Limitations of State-of-the-Art Methods 11_x000D_ <br>4 ExDyna Design 14_x000D_ <br> 4.1 Block-based gradient vector partitioning 14_x000D_ <br> 4.2 Dynamic partition allocation 16_x000D_ <br> 4.3 Partition-wise exclusive gradient selection 19_x000D_ <br> 4.4 Online threshold scaling 20_x000D_ <br>5 Evaluation 23_x000D_ <br> 5.1 Methodology 23_x000D_ <br> 5.2 Performance evaluation 24_x000D_ <br> 5.3 Efficiency evaluation 35_x000D_ <br>6 Conclusion 44_x000D_ <br>Bibliography 45_x000D_ | - |
| dc.language.iso | eng | - |
| dc.publisher | The Graduate School, Ajou University | - |
| dc.rights | 아주대학교 논문은 저작권에 의해 보호받습니다. | - |
| dc.title | Dynamic Gradient Sparsification Exploiting Aggregated Gradients for Scalable Distributed Deep Learning | - |
| dc.title.alternative | 고확장성 분산 딥 러닝을 위한 동적 기울기 희소화 기법 | - |
| dc.type | Thesis | - |
| dc.contributor.affiliation | 아주대학교 대학원 | - |
| dc.contributor.alternativeName | Daegun Yoon | - |
| dc.contributor.department | 일반대학원 인공지능학과 | - |
| dc.date.awarded | 2024-02 | - |
| dc.description.degree | Doctor | - |
| dc.identifier.url | https://dcoll.ajou.ac.kr/dcollection/common/orgView/000000033298 | - |
| dc.subject.keyword | distributed deep learning | - |
| dc.subject.keyword | gradient sparsification | - |
| dc.subject.keyword | scalability | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.