Purification and Multi Temporal Semantic Network for Continuous Sign Language Recognition

Journal: 2022 5th International Conference on Pattern Recognition and Artificial Intelligence, PRAI 2022

Citation: 2022 5th International Conference on Pattern Recognition and Artificial Intelligence, PRAI 2022, pp.437-442

Keyword: Connectionist temporal classification Continuous sign language recognition Knowledge distillation Multi temporal granularity Purification mechanism

Mesh Keyword: Connectionist temporal classification Continuous sign language recognition Knowledge distillation Multi temporal granularity Multi-temporal Purification mechanisms Sign Language recognition Temporal classification Temporal granularity Temporal semantics

All Science Classification Codes (ASJC): Artificial Intelligence Computer Science Applications Computer Vision and Pattern Recognition

Abstract: Continuous Sign Language Recognition (CSLR) is a typical weakly supervised task, which aims to convert a sign video into a gloss sequence. However, there is lack of clear segmentation points between gestures in sign videos, so it is not easy to obtain the time information of gloss. Existing CSLR models usually extract gesture-wise features with a receptive field of single time granularity, which causes inconsistent segmentation and local ambiguity issues and becomes a bottleneck in the entire model. This paper proposes a Purification and Multi Temporal Semantic Network (PMTSNet) to handle the local consistency and context dependency problems. Specifically, the proposed model first extracts frame-wise features of sign language videos using 2D convolutions and then captures gesture-wise features from sign video segments of different temporal granularities. The obtained gesture-wise features are then fed into BiLSTM to get gloss-wise features by modeling the context dependencies. Then, an attention based purification module selectively combines fine-grained gesture and coarse-grained gloss information to obtain features with richer semantics. Finally, the model is trained using multi knowledge distillation connectionist temporal classification loss, which further enhances the performance. The experimental results on the RWTH-PHOENIX-Weather-2014 dataset show that the proposed model outperforms the state-of-the-arts.

URI: https://aurora.ajou.ac.kr/handle/2018.oak/36832
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85141222471&origin=inward

Journal URL: http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=9904023

Funding: ACKNOWLEDGEMENT This work was supported by the Tianjin Science and Technology Program under Grants 18JCYBJC44000 and 19PTZWHZ00020.

qrcode