Ajou University repository

Purification and Multi Temporal Semantic Network for Continuous Sign Language Recognition
Citations

SCOPUS

0

Citation Export

Publication Year
2022-01-01
Journal
2022 5th International Conference on Pattern Recognition and Artificial Intelligence, PRAI 2022
Publisher
Institute of Electrical and Electronics Engineers Inc.
Citation
2022 5th International Conference on Pattern Recognition and Artificial Intelligence, PRAI 2022, pp.437-442
Keyword
Connectionist temporal classificationContinuous sign language recognitionKnowledge distillationMulti temporal granularityPurification mechanism
Mesh Keyword
Connectionist temporal classificationContinuous sign language recognitionKnowledge distillationMulti temporal granularityMulti-temporalPurification mechanismsSign Language recognitionTemporal classificationTemporal granularityTemporal semantics
All Science Classification Codes (ASJC)
Artificial IntelligenceComputer Science ApplicationsComputer Vision and Pattern Recognition
Abstract
Continuous Sign Language Recognition (CSLR) is a typical weakly supervised task, which aims to convert a sign video into a gloss sequence. However, there is lack of clear segmentation points between gestures in sign videos, so it is not easy to obtain the time information of gloss. Existing CSLR models usually extract gesture-wise features with a receptive field of single time granularity, which causes inconsistent segmentation and local ambiguity issues and becomes a bottleneck in the entire model. This paper proposes a Purification and Multi Temporal Semantic Network (PMTSNet) to handle the local consistency and context dependency problems. Specifically, the proposed model first extracts frame-wise features of sign language videos using 2D convolutions and then captures gesture-wise features from sign video segments of different temporal granularities. The obtained gesture-wise features are then fed into BiLSTM to get gloss-wise features by modeling the context dependencies. Then, an attention based purification module selectively combines fine-grained gesture and coarse-grained gloss information to obtain features with richer semantics. Finally, the model is trained using multi knowledge distillation connectionist temporal classification loss, which further enhances the performance. The experimental results on the RWTH-PHOENIX-Weather-2014 dataset show that the proposed model outperforms the state-of-the-arts.
Language
eng
URI
https://aurora.ajou.ac.kr/handle/2018.oak/36832
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85141222471&origin=inward
DOI
https://doi.org/10.1109/prai55851.2022.9904048
Journal URL
http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=9904023
Type
Conference
Funding
ACKNOWLEDGEMENT This work was supported by the Tianjin Science and Technology Program under Grants 18JCYBJC44000 and 19PTZWHZ00020.
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Chung, Tae-Sun Image
Chung, Tae-Sun정태선
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.