Ajou University repository

PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting
  • Zhang, Zhen ;
  • Zhu, Wei ;
  • Zhang, Jinfan ;
  • Wang, Peng ;
  • Jin, Rize ;
  • Chung, Tae Sun
Citations

SCOPUS

0

Citation Export

Publication Year
2022-01-01
Journal
Findings of the Association for Computational Linguistics: NAACL 2022 - Findings
Publisher
Association for Computational Linguistics (ACL)
Citation
Findings of the Association for Computational Linguistics: NAACL 2022 - Findings, pp.327-338
Mesh Keyword
ART modelCompression methodsConfidence levelsDifferent speedEntropy valueIntermediate layersLanguage modelModel compressionSpeed upState of the art
All Science Classification Codes (ASJC)
Computational Theory and MathematicsComputer Science ApplicationsInformation Systems
Abstract
BERT and other pre-trained language models (PLMs) are ubiquitous in modern NLP. Even though PLMs are the state-of-the-art (SOTA) models for almost every NLP task (Qiu et al., 2020), the significant latency during inference prohibits wider industrial usage. In this work, we propose Patient and Confident Early Exiting BERT (PCEE-BERT), an off-the-shelf sample-dependent early exiting method that can work with different PLMs and can also work along with popular model compression methods. With a multi-exit BERT as the backbone model, PCEE-BERT will make the early exiting decision if enough numbers (patience parameter) of consecutive intermediate layers are confident about their predictions. The entropy value measures the confidence level of an intermediate layer's prediction. Experiments on the GLUE benchmark demonstrate that our method outperforms previous SOTA early exiting methods. Ablation studies show that: (a) our method performs consistently well on other PLMs, such as ALBERT and TinyBERT; (b) PCEE-BERT can achieve different speed-up ratios by adjusting the patience parameter and the confidence threshold. The code for PCEEBERT can be found at https://github. com/michael-wzhu/PCEE-BERT.
Language
eng
URI
https://aurora.ajou.ac.kr/handle/2018.oak/36858
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85137325271&origin=inward
DOI
https://doi.org/2-s2.0-85137325271
Journal URL
https://aclanthology.org/events/naacl-2022/#2022-findings-naacl
Type
Conference
Funding
This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2021-0-02051) supervised by the IITP (Institute for Information Communications Technology Planning Evaluation) and the BK21 FOUR program of the National Research Foundation of Korea funded by the Ministry of Education (NRF5199991014091).
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Chung, Tae-Sun Image
Chung, Tae-Sun정태선
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.