Federated learning (FL) was proposed for training a deep neural network model using millions of user data. The technique has attracted considerable attention owing to its privacy-preserving characteristic. However, two major challenges exist. The first is the limitation of simultaneously participating clients. If the number of clients increases, the single parameter server easily becomes a bottleneck and is prone to have stragglers. The second is data heterogeneity, which adversely affects the accuracy of the global model. Because data should remain at user devices to preserve privacy, we cannot use data shuffling, which is used to homogenize training data in traditional distributed deep learning. We propose a client clustering and model aggregation method, CCFed, to increase the number of simultaneously participating clients and mitigate the data heterogeneity problem. CCFed improves the learning performance using set partition modeling to let data be evenly distributed between clusters and mitigate the effect of a non-IID environment. Experiments show that we can achieve a 2.7-14% higher accuracy using CCFed compared with FedAvg, where CCFed requires approximately 50% less number of rounds compared with FedAvg training on benchmark datasets.
This research was supported by the Korea Insitute of Science and TechnologyInformation(KISTI) (P22010) and by the Basic Science Research Program Through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2022R1F1A1062779).