Citation Export
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Uddin, Md Azher | - |
dc.contributor.author | Joolee, Joolekha Bibi | - |
dc.contributor.author | Sohn, Kyung Ah | - |
dc.date.issued | 2023-07-01 | - |
dc.identifier.issn | 1949-3045 | - |
dc.identifier.uri | https://dspace.ajou.ac.kr/dev/handle/2018.oak/32748 | - |
dc.description.abstract | Depression is a severe mental illness that impairs a person's capacity to function normally in personal and professional life. The assessment of depression usually requires a comprehensive examination by an expert professional. Recently, machine learning-based automatic depression assessment has received considerable attention for a reliable and efficient depression diagnosis. Various techniques for automated depression detection were developed; however, certain concerns still need to be investigated. In this work, we propose a novel deep multi-modal framework that effectively utilizes facial and verbal cues for an automated depression assessment. Specifically, we first partition the audio and video data into fixed-length segments. Then, these segments are fed into the Spatio-Temporal Networks as input, which captures both spatial and temporal features as well as assigns higher weights to the features that contribute most. In addition, Volume Local Directional Structural Pattern (VLDSP) based dynamic feature descriptor is introduced to extract the facial dynamics by encoding the structural aspects. Afterwards, we employ the Temporal Attentive Pooling (TAP) approach to summarize the segment-level features for audio and video data. Finally, the multi-modal factorized bilinear pooling (MFB) strategy is applied to fuse the multi-modal features effectively. An extensive experimental study reveals that the proposed method outperforms state-of-the-art approaches. | - |
dc.language.iso | eng | - |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | - |
dc.subject.mesh | Convolutional neural network | - |
dc.subject.mesh | Depression | - |
dc.subject.mesh | Encodings | - |
dc.subject.mesh | Features extraction | - |
dc.subject.mesh | Multi-modal | - |
dc.subject.mesh | Multi-modal factorized bilinear pooling | - |
dc.subject.mesh | Structural pattern | - |
dc.subject.mesh | Temporal attentive pooling | - |
dc.subject.mesh | Three-dimensional display | - |
dc.subject.mesh | Volume local directional structural pattern | - |
dc.title | Deep Multi-Modal Network Based Automated Depression Severity Estimation | - |
dc.type | Article | - |
dc.citation.endPage | 2167 | - |
dc.citation.startPage | 2153 | - |
dc.citation.title | IEEE Transactions on Affective Computing | - |
dc.citation.volume | 14 | - |
dc.identifier.bibliographicCitation | IEEE Transactions on Affective Computing, Vol.14, pp.2153-2167 | - |
dc.identifier.doi | 10.1109/taffc.2022.3179478 | - |
dc.identifier.scopusid | 2-s2.0-85131766440 | - |
dc.identifier.url | http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=5165369 | - |
dc.subject.keyword | Depression | - |
dc.subject.keyword | multi-modal factorized bilinear pooling | - |
dc.subject.keyword | spatio-temporal networks | - |
dc.subject.keyword | temporal attentive pooling | - |
dc.subject.keyword | volume local directional structural pattern | - |
dc.description.isoa | false | - |
dc.subject.subarea | Software | - |
dc.subject.subarea | Human-Computer Interaction | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.