Enhancing Voice Phishing Detection Using Multilingual Back-Translation and SMOTE: An Empirical Study

Boussougou, Milandu Keith Moussavou; Hamandawana, Prince; Park, Dong Joo

DC Field	Value	Language
dc.contributor.author	Boussougou, Milandu Keith Moussavou	-
dc.contributor.author	Hamandawana, Prince	-
dc.contributor.author	Park, Dong Joo	-
dc.date.issued	2025-01-01	-
dc.identifier.issn	2169-3536	-
dc.identifier.uri	https://aurora.ajou.ac.kr/handle/2018.oak/38549	-
dc.identifier.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=86000720385&origin=inward	-
dc.description.abstract	With the widespread global trend of voice phishing or vishing attacks, the development of effective detection models using artificial intelligence (AI) has been hindered by the lack of high-quality and large volumes of data. This lack of data reflecting a real vishing scenario often leads to imbalanced datasets and biased detection models. Therefore, we present in this paper a data augmentation (DA) method for expanding the imbalanced Korean call content vishing (KorCCVi) dataset to address the existing data asymmetry problem and enhance the performance of Korean vishing detection. The proposed approach for DA involves using the back-translation (BT) method with three different intermediate languages: English, Chinese, and Japanese. The proposed method offers several advantages over the traditional synthetic minority oversampling technique (SMOTE), which is the main technique used to compare with our multilingual BT approach. Using these two DA techniques, several machine learning (ML) and deep learning (DL) models were trained on the original imbalanced dataset, the dataset balanced with SMOTE and its variants, and the dataset augmented with our method. We analyzed the impact of these DA methods on the performance of the models, demonstrated the benefits of each approach, and suggested the most suitable approach. The performance of the trained models was evaluated using the accuracy, precision, recall, and F1-score metrics. The experimental results demonstrated that the proposed multilingual BT method effectively expands the dataset while preserving its contextual and linguistic characteristics. The average performance of the models revealed that those trained on the augmented dataset outperformed the other models. They achieved F1-scores of 98.91% for the back-translated data, 98.14% for the original data, and 97.23% for SMOTE.	-
dc.language.iso	eng	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.subject.mesh	Back translations	-
dc.subject.mesh	Data augmentation	-
dc.subject.mesh	Language processing	-
dc.subject.mesh	Machine-learning	-
dc.subject.mesh	Natural language processing	-
dc.subject.mesh	Natural languages	-
dc.subject.mesh	Performance	-
dc.subject.mesh	Phishing	-
dc.subject.mesh	Synthetic minority over-sampling techniques	-
dc.subject.mesh	Voice phishing	-
dc.title	Enhancing Voice Phishing Detection Using Multilingual Back-Translation and SMOTE: An Empirical Study	-
dc.type	Article	-
dc.citation.endPage	37965	-
dc.citation.startPage	37946	-
dc.citation.title	IEEE Access	-
dc.citation.volume	13	-
dc.identifier.bibliographicCitation	IEEE Access, Vol.13, pp.37946-37965	-
dc.identifier.doi	10.1109/access.2025.3545250	-
dc.identifier.scopusid	2-s2.0-86000720385	-
dc.identifier.url	http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639	-
dc.subject.keyword	Back-translation	-
dc.subject.keyword	data augmentation	-
dc.subject.keyword	machine learning	-
dc.subject.keyword	natural language processing	-
dc.subject.keyword	SMOTE	-
dc.subject.keyword	voice phishing	-
dc.type.other	Article	-
dc.identifier.pissn	21693536	-
dc.description.isoa	true	-
dc.subject.subarea	Computer Science (all)	-
dc.subject.subarea	Materials Science (all)	-
dc.subject.subarea	Engineering (all)	-

Show simple item record

qrcode

트윗하기

Related Researcher

HAMANDAWANA, PRINCEHAMANDAWANA PRINCE: Department of Software and Computer Engineering

File Download

There are no files associated with this item.

Related Researcher

Total Views & Downloads

File Download