Knowledge distillation meets recommendation: collaborative distillation for top-N recommendation

Keyword: Collaborative filtering Data ambiguity Data sparsity Knowledge distillation Top-N recommendation

Mesh Keyword: Classification tasks Data ambiguities Data sparsity Feedback problems Hit rate Knowledge distillation Ranking problems Student Modeling Teacher models Top-N recommendation

All Science Classification Codes (ASJC): Software Information Systems Human-Computer Interaction Hardware and Architecture Artificial Intelligence

Abstract: Knowledge distillation (KD) is a successful method for transferring knowledge from one model (i.e., teacher model) to another model (i.e., student model). Despite the success of KD in classification tasks, applying KD to recommender models is challenging because of the sparsity of positive feedback, ambiguity of missing feedback, and ranking problem for top-N recommendation. In this paper, we propose a new KD model for collaborative filtering, namely collaborative distillation (CD). Specifically, (1) we reformulate a loss function to deal with the ambiguity of missing feedback. (2) We exploit probabilistic rank-aware sampling for top-N recommendation. (3) To train the proposed model effectively, we develop two training strategies for the student model, called teacher- and student-guided training methods, adaptively selecting the most beneficial feedback from the teacher model. Furthermore, we extend our model using self-distillation, called born-again CD (BACD). That is, the teacher and student models with the same model capacity are trained by using the proposed distillation method. The experimental results demonstrate that CD outperforms the state-of-the-art method by 2.7–33.2% and 2.7–29.9% in hit rate (HR) and normalized discounted cumulative gain (NDCG), respectively. Moreover, BACD improves the teacher model by 3.5–12.0% and 4.9–13.3% in HR and NDCG, respectively.

Funding: This work was supported by the National Research Foundation of Korea (NRF) (NRF-2018R1A5A1060031 and NRF-2021R1F1A1063843). Also, this work was supported by Institute of Information & communications Technology Planning & evaluation (IITP) funded by the Korea government (MSIT) (No. 2020-0-01821, ICT Creative Consilience Program).This work was supported by the National Research Foundation of Korea (NRF) (NRF-2018R1A5A1060031 and NRF-2021R1F1A1063843). Also, this work was supported by Institute of Information & communications Technology Planning & evaluation (IITP) funded by the Korea government (MSIT) (No. 2020-0-01821, ICT Creative Consilience Program).

qrcode