Ajou University repository

PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting
  • Zhang, Zhen ;
  • Zhu, Wei ;
  • Zhang, Jinfan ;
  • Wang, Peng ;
  • Jin, Rize ;
  • Chung, Tae Sun
Citations

SCOPUS

0

Citation Export

DC Field Value Language
dc.contributor.authorZhang, Zhen-
dc.contributor.authorZhu, Wei-
dc.contributor.authorZhang, Jinfan-
dc.contributor.authorWang, Peng-
dc.contributor.authorJin, Rize-
dc.contributor.authorChung, Tae Sun-
dc.date.issued2022-01-01-
dc.identifier.urihttps://aurora.ajou.ac.kr/handle/2018.oak/36858-
dc.identifier.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85137325271&origin=inward-
dc.description.abstractBERT and other pre-trained language models (PLMs) are ubiquitous in modern NLP. Even though PLMs are the state-of-the-art (SOTA) models for almost every NLP task (Qiu et al., 2020), the significant latency during inference prohibits wider industrial usage. In this work, we propose Patient and Confident Early Exiting BERT (PCEE-BERT), an off-the-shelf sample-dependent early exiting method that can work with different PLMs and can also work along with popular model compression methods. With a multi-exit BERT as the backbone model, PCEE-BERT will make the early exiting decision if enough numbers (patience parameter) of consecutive intermediate layers are confident about their predictions. The entropy value measures the confidence level of an intermediate layer's prediction. Experiments on the GLUE benchmark demonstrate that our method outperforms previous SOTA early exiting methods. Ablation studies show that: (a) our method performs consistently well on other PLMs, such as ALBERT and TinyBERT; (b) PCEE-BERT can achieve different speed-up ratios by adjusting the patience parameter and the confidence threshold. The code for PCEEBERT can be found at https://github. com/michael-wzhu/PCEE-BERT.-
dc.description.sponsorshipThis research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2021-0-02051) supervised by the IITP (Institute for Information Communications Technology Planning Evaluation) and the BK21 FOUR program of the National Research Foundation of Korea funded by the Ministry of Education (NRF5199991014091).-
dc.language.isoeng-
dc.publisherAssociation for Computational Linguistics (ACL)-
dc.subject.meshART model-
dc.subject.meshCompression methods-
dc.subject.meshConfidence levels-
dc.subject.meshDifferent speed-
dc.subject.meshEntropy value-
dc.subject.meshIntermediate layers-
dc.subject.meshLanguage model-
dc.subject.meshModel compression-
dc.subject.meshSpeed up-
dc.subject.meshState of the art-
dc.titlePCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting-
dc.typeConference-
dc.citation.conferenceDate2022.7.10. ~ 2022.7.15.-
dc.citation.conferenceName2022 Findings of the Association for Computational Linguistics: NAACL 2022-
dc.citation.editionFindings of the Association for Computational Linguistics: NAACL 2022 - Findings-
dc.citation.endPage338-
dc.citation.startPage327-
dc.citation.titleFindings of the Association for Computational Linguistics: NAACL 2022 - Findings-
dc.identifier.bibliographicCitationFindings of the Association for Computational Linguistics: NAACL 2022 - Findings, pp.327-338-
dc.identifier.doi2-s2.0-85137325271-
dc.identifier.scopusid2-s2.0-85137325271-
dc.identifier.urlhttps://aclanthology.org/events/naacl-2022/#2022-findings-naacl-
dc.type.otherConference Paper-
dc.subject.subareaComputational Theory and Mathematics-
dc.subject.subareaComputer Science Applications-
dc.subject.subareaInformation Systems-
Show simple item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Chung, Tae-Sun Image
Chung, Tae-Sun정태선
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.