Citation Export
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhang, Zhen | - |
dc.contributor.author | Zhu, Wei | - |
dc.contributor.author | Zhang, Jinfan | - |
dc.contributor.author | Wang, Peng | - |
dc.contributor.author | Jin, Rize | - |
dc.contributor.author | Chung, Tae Sun | - |
dc.date.issued | 2022-01-01 | - |
dc.identifier.uri | https://aurora.ajou.ac.kr/handle/2018.oak/36858 | - |
dc.identifier.uri | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85137325271&origin=inward | - |
dc.description.abstract | BERT and other pre-trained language models (PLMs) are ubiquitous in modern NLP. Even though PLMs are the state-of-the-art (SOTA) models for almost every NLP task (Qiu et al., 2020), the significant latency during inference prohibits wider industrial usage. In this work, we propose Patient and Confident Early Exiting BERT (PCEE-BERT), an off-the-shelf sample-dependent early exiting method that can work with different PLMs and can also work along with popular model compression methods. With a multi-exit BERT as the backbone model, PCEE-BERT will make the early exiting decision if enough numbers (patience parameter) of consecutive intermediate layers are confident about their predictions. The entropy value measures the confidence level of an intermediate layer's prediction. Experiments on the GLUE benchmark demonstrate that our method outperforms previous SOTA early exiting methods. Ablation studies show that: (a) our method performs consistently well on other PLMs, such as ALBERT and TinyBERT; (b) PCEE-BERT can achieve different speed-up ratios by adjusting the patience parameter and the confidence threshold. The code for PCEEBERT can be found at https://github. com/michael-wzhu/PCEE-BERT. | - |
dc.description.sponsorship | This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2021-0-02051) supervised by the IITP (Institute for Information Communications Technology Planning Evaluation) and the BK21 FOUR program of the National Research Foundation of Korea funded by the Ministry of Education (NRF5199991014091). | - |
dc.language.iso | eng | - |
dc.publisher | Association for Computational Linguistics (ACL) | - |
dc.subject.mesh | ART model | - |
dc.subject.mesh | Compression methods | - |
dc.subject.mesh | Confidence levels | - |
dc.subject.mesh | Different speed | - |
dc.subject.mesh | Entropy value | - |
dc.subject.mesh | Intermediate layers | - |
dc.subject.mesh | Language model | - |
dc.subject.mesh | Model compression | - |
dc.subject.mesh | Speed up | - |
dc.subject.mesh | State of the art | - |
dc.title | PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting | - |
dc.type | Conference | - |
dc.citation.conferenceDate | 2022.7.10. ~ 2022.7.15. | - |
dc.citation.conferenceName | 2022 Findings of the Association for Computational Linguistics: NAACL 2022 | - |
dc.citation.edition | Findings of the Association for Computational Linguistics: NAACL 2022 - Findings | - |
dc.citation.endPage | 338 | - |
dc.citation.startPage | 327 | - |
dc.citation.title | Findings of the Association for Computational Linguistics: NAACL 2022 - Findings | - |
dc.identifier.bibliographicCitation | Findings of the Association for Computational Linguistics: NAACL 2022 - Findings, pp.327-338 | - |
dc.identifier.doi | 2-s2.0-85137325271 | - |
dc.identifier.scopusid | 2-s2.0-85137325271 | - |
dc.identifier.url | https://aclanthology.org/events/naacl-2022/#2022-findings-naacl | - |
dc.type.other | Conference Paper | - |
dc.subject.subarea | Computational Theory and Mathematics | - |
dc.subject.subarea | Computer Science Applications | - |
dc.subject.subarea | Information Systems | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.