As the attempts to distribute deep learning using personal data have increased, the importance of federated learning (FL) has also increased. Attempts have been made to overcome the core challenges of federated learning (i.e., statistical and system heterogeneity) using synchronous or asynchronous protocols. However, stragglers reduce training efficiency in terms of latency and accuracy in each protocol, respectively. To solve straggler issues, a semi-asynchronous protocol that combines the two protocols can be applied to FL; however, effectively handling the staleness of the local model is a difficult problem. We proposed SASAFL to solve the training inefficiency caused by staleness in semi-asynchronous FL. SASAFL enables stable training by considering the quality of the global model to synchronise the servers and clients. In addition, it achieves high accuracy and low latency by adjusting the number of participating clients in response to changes in global loss and immediately processing clients that did not to participate in the previous round. An evaluation was conducted under various conditions to verify the effectiveness of the SASAFL. SASAFL achieved 19.69%p higher accuracy than the baseline, 2.32 times higher round-to-accuracy and 2.24 times higher latency-to-accuracy. Additionally, SASAFL always achieved target accuracy that the baseline can't reach.
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Sangyoon Oh reports financial support was provided by Institute of Information and Communications Technology Planning and Evaluation (IITP). Miri Yu reports equipment, drugs, or supplies was provided by Korea Institute of Science and Technology Information (KISTI). Sangyoon Oh reports a relationship with National Science Foundation of Korea (NRF-Korea) that includes: consulting or advisory and funding grants. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.This work was jointly supported by the Korea Institute of Science and Technology Information (KISTI) (KSC2022-CRE-0406), NRF-Korea grant funded by the Korea government (MSIT) (RS-2023-00283799), and Artificial Intelligence Convergence Innovation Human Resources Development by the Institute of Information and Communications Technology Planning and Evaluation (IITP-2023-No.RS-2023-00255968). The authors would like to thank Editage (www.editage.co.kr) for English language editing.This work was jointly supported by the Korea Institute of Science and Technology Information (KISTI) (KSC2022-CRE-0406) and IITP-2023-No.RS-2023-00255968, Artificial Intelligence Convergence Innovation Human Resources Development. The authors would like to thank Editage (www.editage.co.kr) for English language editing.