Deep Multi-Modal Network Based Automated Depression Severity Estimation

Uddin, Md Azher; Joolee, Joolekha Bibi; Sohn, Kyung Ah

DC Field	Value	Language
dc.contributor.author	Uddin, Md Azher	-
dc.contributor.author	Joolee, Joolekha Bibi	-
dc.contributor.author	Sohn, Kyung Ah	-
dc.date.issued	2023-07-01	-
dc.identifier.issn	1949-3045	-
dc.identifier.uri	https://dspace.ajou.ac.kr/dev/handle/2018.oak/32748	-
dc.description.abstract	Depression is a severe mental illness that impairs a person's capacity to function normally in personal and professional life. The assessment of depression usually requires a comprehensive examination by an expert professional. Recently, machine learning-based automatic depression assessment has received considerable attention for a reliable and efficient depression diagnosis. Various techniques for automated depression detection were developed; however, certain concerns still need to be investigated. In this work, we propose a novel deep multi-modal framework that effectively utilizes facial and verbal cues for an automated depression assessment. Specifically, we first partition the audio and video data into fixed-length segments. Then, these segments are fed into the Spatio-Temporal Networks as input, which captures both spatial and temporal features as well as assigns higher weights to the features that contribute most. In addition, Volume Local Directional Structural Pattern (VLDSP) based dynamic feature descriptor is introduced to extract the facial dynamics by encoding the structural aspects. Afterwards, we employ the Temporal Attentive Pooling (TAP) approach to summarize the segment-level features for audio and video data. Finally, the multi-modal factorized bilinear pooling (MFB) strategy is applied to fuse the multi-modal features effectively. An extensive experimental study reveals that the proposed method outperforms state-of-the-art approaches.	-
dc.language.iso	eng	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.subject.mesh	Convolutional neural network	-
dc.subject.mesh	Depression	-
dc.subject.mesh	Encodings	-
dc.subject.mesh	Features extraction	-
dc.subject.mesh	Multi-modal	-
dc.subject.mesh	Multi-modal factorized bilinear pooling	-
dc.subject.mesh	Structural pattern	-
dc.subject.mesh	Temporal attentive pooling	-
dc.subject.mesh	Three-dimensional display	-
dc.subject.mesh	Volume local directional structural pattern	-
dc.title	Deep Multi-Modal Network Based Automated Depression Severity Estimation	-
dc.type	Article	-
dc.citation.endPage	2167	-
dc.citation.startPage	2153	-
dc.citation.title	IEEE Transactions on Affective Computing	-
dc.citation.volume	14	-
dc.identifier.bibliographicCitation	IEEE Transactions on Affective Computing, Vol.14, pp.2153-2167	-
dc.identifier.doi	10.1109/taffc.2022.3179478	-
dc.identifier.scopusid	2-s2.0-85131766440	-
dc.identifier.url	http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=5165369	-
dc.subject.keyword	Depression	-
dc.subject.keyword	multi-modal factorized bilinear pooling	-
dc.subject.keyword	spatio-temporal networks	-
dc.subject.keyword	temporal attentive pooling	-
dc.subject.keyword	volume local directional structural pattern	-
dc.description.isoa	false	-
dc.subject.subarea	Software	-
dc.subject.subarea	Human-Computer Interaction	-

Show simple item record

qrcode

트윗하기

Related Researcher

Sohn, Kyung-Ah손경아: Department of Software and Computer Engineering

File Download

There are no files associated with this item.

Related Researcher

Total Views & Downloads

File Download