Why Is It Hate Speech? Masked Rationale Prediction for Explainable Hate Speech Detection

Journal: Proceedings - International Conference on Computational Linguistics, COLING

Citation: Proceedings - International Conference on Computational Linguistics, COLING, Vol.29 No.1, pp.6644-6655

Mesh Keyword: Detection models Detection performance Human judgments Learn+Reasoning ability Speech detection State-of-the-art performance

All Science Classification Codes (ASJC): Computational Theory and Mathematics Computer Science Applications Theoretical Computer Science

Abstract: In a hate speech detection model, we should consider two critical aspects in addition to detection performance–bias and explainability. Hate speech cannot be identified based solely on the presence of specific words; the model should be able to reason like humans and be explainable. To improve the performance concerning the two aspects, we propose Masked Rationale Prediction (MRP) as an intermediate task. MRP is a task to predict the masked human rationales–snippets of a sentence that are grounds for human judgment–by referring to surrounding tokens combined with their unmasked rationales. As the model learns its reasoning ability based on rationales by MRP, it performs hate speech detection robustly in terms of bias and explainability. The proposed method generally achieves state-of-the-art performance in various metrics, demonstrating its effectiveness for hate speech detection. Warning: This paper contains samples that may be upsetting.

URI: https://aurora.ajou.ac.kr/handle/2018.oak/36861
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85150635665&origin=inward

Funding: This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF) (NRF-2021S1A5A2A03065899), and also by the NRF of Korea grant funded by the Korean government (MSIT) (NRF-2022R1A2C1007434).

qrcode