Anti-focal loss for speech recognition on small-scale datasets

Journal: 2021 4th International Conference on Pattern Recognition and Artificial Intelligence, PRAI 2021

Citation: 2021 4th International Conference on Pattern Recognition and Artificial Intelligence, PRAI 2021, pp.19-22

Mesh Keyword: Automatic speech recognition system Encoder-decoder architecture Few-shot learning Learning models Prediction tasks Sequential prediction Small scale Small-scale data Transformer Transformer modeling

All Science Classification Codes (ASJC): Artificial Intelligence Computer Vision and Pattern Recognition

Abstract: Deep learning models with encoder-decoder architecture become popular in automatic speech recognition systems, due to their success in sequential prediction tasks. Recently, the conformer model has greatly improved the accuracy of speech recognition. However, similar to transformer models, its training relies on a large amount of data. This paper explores an efficient few-shot learning strategy. Specifically, a spec-augment approach is proposed to augment the speech dataset, then a novel loss function, anti-focal loss, is introduced to encourage fast convergence in a small-scale, unbalanced data setting. Extensive experiments on aishell-l dataset show that our model outperforms state-of-the-art approaches under limited support data, in terms of convergence speed and generalization ability.

URI: https://aurora.ajou.ac.kr/handle/2018.oak/36714
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85117960182&origin=inward

Journal URL: http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=9550757

qrcode