PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting

Zhang, Zhen; Zhu, Wei; Zhang, Jinfan; Wang, Peng; Jin, Rize; Chung, Tae Sun

DC Field	Value	Language
dc.contributor.author	Zhang, Zhen	-
dc.contributor.author	Zhu, Wei	-
dc.contributor.author	Zhang, Jinfan	-
dc.contributor.author	Wang, Peng	-
dc.contributor.author	Jin, Rize	-
dc.contributor.author	Chung, Tae Sun	-
dc.date.issued	2022-01-01	-
dc.identifier.uri	https://aurora.ajou.ac.kr/handle/2018.oak/36858	-
dc.identifier.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85137325271&origin=inward	-
dc.description.abstract	BERT and other pre-trained language models (PLMs) are ubiquitous in modern NLP. Even though PLMs are the state-of-the-art (SOTA) models for almost every NLP task (Qiu et al., 2020), the significant latency during inference prohibits wider industrial usage. In this work, we propose Patient and Confident Early Exiting BERT (PCEE-BERT), an off-the-shelf sample-dependent early exiting method that can work with different PLMs and can also work along with popular model compression methods. With a multi-exit BERT as the backbone model, PCEE-BERT will make the early exiting decision if enough numbers (patience parameter) of consecutive intermediate layers are confident about their predictions. The entropy value measures the confidence level of an intermediate layer's prediction. Experiments on the GLUE benchmark demonstrate that our method outperforms previous SOTA early exiting methods. Ablation studies show that: (a) our method performs consistently well on other PLMs, such as ALBERT and TinyBERT; (b) PCEE-BERT can achieve different speed-up ratios by adjusting the patience parameter and the confidence threshold. The code for PCEEBERT can be found at https://github. com/michael-wzhu/PCEE-BERT.	-
dc.description.sponsorship	This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2021-0-02051) supervised by the IITP (Institute for Information Communications Technology Planning Evaluation) and the BK21 FOUR program of the National Research Foundation of Korea funded by the Ministry of Education (NRF5199991014091).	-
dc.language.iso	eng	-
dc.publisher	Association for Computational Linguistics (ACL)	-
dc.subject.mesh	ART model	-
dc.subject.mesh	Compression methods	-
dc.subject.mesh	Confidence levels	-
dc.subject.mesh	Different speed	-
dc.subject.mesh	Entropy value	-
dc.subject.mesh	Intermediate layers	-
dc.subject.mesh	Language model	-
dc.subject.mesh	Model compression	-
dc.subject.mesh	Speed up	-
dc.subject.mesh	State of the art	-
dc.title	PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting	-
dc.type	Conference	-
dc.citation.conferenceDate	2022.07.10.~2022.07.15.	-
dc.citation.conferenceName	2022 Findings of the Association for Computational Linguistics: NAACL 2022	-
dc.citation.edition	Findings of the Association for Computational Linguistics: NAACL 2022 - Findings	-
dc.citation.endPage	338	-
dc.citation.startPage	327	-
dc.citation.title	Findings of the Association for Computational Linguistics: NAACL 2022 - Findings	-
dc.identifier.bibliographicCitation	Findings of the Association for Computational Linguistics: NAACL 2022 - Findings, pp.327-338	-
dc.identifier.doi	10.18653/v1/2022.findings-naacl.25	-
dc.identifier.scopusid	2-s2.0-85137325271	-
dc.identifier.url	https://aclanthology.org/events/naacl-2022/#2022-findings-naacl	-
dc.type.other	Conference Paper	-
dc.subject.subarea	Computational Theory and Mathematics	-
dc.subject.subarea	Computer Science Applications	-
dc.subject.subarea	Information Systems	-

Show simple item record

qrcode

트윗하기

Related Researcher

Chung, Tae-Sun정태선: Department of Software and Computer Engineering

File Download

There are no files associated with this item.

Related Researcher

Total Views & Downloads

File Download