Ajou University repository

Multi-Channel Spatio-Temporal Transformer for Sign Language Production
Citations

SCOPUS

0

Citation Export

Publication Year
2024-01-01
Journal
2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
Publisher
European Language Resources Association (ELRA)
Citation
2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, pp.11699-11712
Keyword
Sign Language ProductionSpatio-Temporal FusionTransformer
Mesh Keyword
Language productionMachine-learningMulti channelProduction modelsSign languageSign language productionSpatio-temporalSpatio-temporal fusionsSpoken languagesTransformer
All Science Classification Codes (ASJC)
Theoretical Computer ScienceComputational Theory and MathematicsComputer Science Applications
Abstract
The task of Sign Language Production (SLP) in machine learning involves converting text-based spoken language into corresponding sign language expressions. Sign language conveys meaning through the continuous movement of multiple articulators, including manual and non-manual channels. However, most current Transformer-based SLP models convert these multi-channel sign poses into a unified feature representation, ignoring the inherent structural correlations between channels. This paper introduces a novel approach called MCST-Transformer for skeletal sign language production. It employs multi-channel spatial attention to capture correlations across various channels within each frame, and temporal attention to learn sequential dependencies for each channel over time. Additionally, the paper explores and experiments with multiple fusion techniques to combine the spatial and temporal representations into naturalistic sign sequences. To validate the effectiveness of the proposed MCST-Transformer model and its constituent components, extensive experiments were conducted on two benchmark sign language datasets from diverse cultures. The results demonstrate that this new approach outperforms state-of-the-art models on both datasets.
Language
eng
URI
https://aurora.ajou.ac.kr/handle/2018.oak/37104
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85195911057&origin=inward
Type
Conference
Funding
This work was supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development (IITP-2024-RS-2023-00255968) grant, the ITRC (Information Technology Research Center) support program (IITP-2021-0-02051) funded by the Korea government (MSIT), and the Foreign Intelligence support program funded by Shijiazhuang Science and Technology Bureau (Project No. 20240024).
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Chung, Tae-Sun Image
Chung, Tae-Sun정태선
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.