Ajou University repository

Cross-Modal Dynamic Transfer Learning for Multimodal Emotion Recognitionoa mark
Citations

SCOPUS

5

Citation Export

Publication Year
2024-01-01
Publisher
Institute of Electrical and Electronics Engineers Inc.
Citation
IEEE Access, Vol.12, pp.14324-14333
Keyword
Affective computingcross-modal knowledge transfermodel confidencemultimodal emotion recognition
Mesh Keyword
Affective ComputingComputational modellingCross-modalCross-modal knowledge transferEmotion recognitionFeatures extractionKnowledge transferModel confidenceMultimodal emotion recognitionTransfer learning
All Science Classification Codes (ASJC)
Computer Science (all)Materials Science (all)Engineering (all)
Abstract
Multimodal Emotion Recognition is an important research area for developing human-centric applications, especially in the context of video platforms. Most existing models have attempted to develop sophisticated fusion techniques to integrate heterogeneous features from different modalities. However, these fusion methods can affect performance since not all modalities help figure out the semantic alignment for emotion prediction. We observed that the 8.0% of misclassified instances' performance is improved for the existing fusion model when one of the input modalities is masked. Based on this observation, we propose a representation learning method called Cross-modal DynAmic Transfer learning (CDaT), which dynamically filters the low-confident modality and complements it with the high-confident modality using uni-modal masking and cross-modal representation transfer learning. We train an auxiliary network that learns model confidence scores to determine which modality is low-confident and how much the transfer should occur from other modalities. Furthermore, it can be used with any fusion model in a model-agnostic way because it leverages transfer between low-level uni-modal information via probabilistic knowledge transfer loss. Experiments have demonstrated the effect of CDaT with four different state-of-the-art fusion models on the CMU-MOSEI and IEMOCAP datasets for emotion recognition.
ISSN
2169-3536
Language
eng
URI
https://dspace.ajou.ac.kr/dev/handle/2018.oak/33911
DOI
https://doi.org/10.1109/access.2024.3356185
Fulltext

Type
Article
Funding
This work was supported in part by the Institute of Information & Communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development Grant funded by the Korea Government (MSIT) under Grant IITP-2023-No.RS-2023-00255968, and in part by the BK21 FOUR Program of the National Research Foundation of Korea funded by the Ministry of Education under Grant NRF5199991014091.
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Cho, Hyunsouk Image
Cho, Hyunsouk조현석
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.