Facial expressions are the most common medium for expressing human emotions. Due to the wide range of real-world applications, facial expression understanding has received extensive attention from researchers. One of the most vital issues of facial expression recognition is the extraction and modeling of the temporal dynamics of facial emotions from videos. Additionally, the rapid growth of video data from various multimedia sources is becoming a serious concern. Therefore, to address these issues, in this paper, we introduce a novel approach on top of Spark for facial expression understanding from videos. First, we propose a new dynamic feature descriptor, namely, the local directional structural pattern from three orthogonal planes (LDSP-TOP), which analyzes the structural aspects of the local dynamic texture. Second, we design a 1-D convolutional neural network (CNN) to capture additional discriminative features. Third, a long short-term memory (LSTM) autoencoder is employed to learn the spatiotemporal features. Finally, an extensive experimental investigation is carried out to demonstrate the performance and scalability of the proposed framework.
This research was supported by the National Research Foundation of Korea grant funded by the Korea government (MSIT) (NRF-2019R1A2C1006608), and also under the ITRC (Information Technology Research Center) support program (IITP-2020-2018-0-01431) supervised by the IITP (Institute for Information and Communications Technology Planning and Evaluation). The publication cost was supported by the BK21 plus program through NRF funded by the Ministry of Education of Korea.