Ajou University repository

Attentional bias for hands: Cascade dual-decoder transformer for sign language productionoa mark
Citations

SCOPUS

2

Citation Export

Publication Year
2024-08-01
Publisher
John Wiley and Sons Inc
Citation
IET Computer Vision, Vol.18, pp.696-708
Keyword
computer visionnatural language processingpose estimationsign language production
Mesh Keyword
Information channelsLanguage processingLanguage productionNatural language processingNatural languagesPose-estimationSign languageSign language productionSpoken languagesTime step
All Science Classification Codes (ASJC)
SoftwareComputer Vision and Pattern Recognition
Abstract
Sign Language Production (SLP) refers to the task of translating textural forms of spoken language into corresponding sign language expressions. Sign languages convey meaning by means of multiple asynchronous articulators, including manual and non-manual information channels. Recent deep learning-based SLP models directly generate the full-articulatory sign sequence from the text input in an end-to-end manner. However, these models largely down weight the importance of subtle differences in the manual articulation due to the effect of regression to the mean. To explore these neglected aspects, an efficient cascade dual-decoder Transformer (CasDual-Transformer) for SLP is proposed to learn, successively, two mappings SLPhand: Text → Hand pose and SLPsign: Text → Sign pose, utilising an attention-based alignment module that fuses the hand and sign features from previous time steps to predict more expressive sign pose at the current time step. In addition, to provide more efficacious guidance, a novel spatio-temporal loss to penalise shape dissimilarity and temporal distortions of produced sequences is introduced. Experimental studies are performed on two benchmark sign language datasets from distinct cultures to verify the performance of the proposed model. Both quantitative and qualitative results show that the authors’ model demonstrates competitive performance compared to state-of-the-art models, and in some cases, achieves considerable improvements over them.
Language
eng
URI
https://dspace.ajou.ac.kr/dev/handle/2018.oak/34018
DOI
https://doi.org/10.1049/cvi2.12273
Fulltext

Type
Article
Funding
This work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development (IITP-2023-RS-2023-00255968) grant and the ITRC (Information Technology Research Center) support program (IITP-2021-0-02051) funded by the Korea government (MSIT).This work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development (IITP\u20102023\u2010RS\u20102023\u201000255968) grant and the ITRC (Information Technology Research Center) support program (IITP\u20102021\u20100\u201002051) funded by the Korea government (MSIT).
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Chung, Tae-Sun Image
Chung, Tae-Sun정태선
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.