Improving Unsupervised Out-of-domain Detection through Pseudo Labeling and Learning

Lee, Byounghan; Kim, Jaesik; Park, Junekyu; Sohn, Kyung Ah

DC Field	Value	Language
dc.contributor.author	Lee, Byounghan	-
dc.contributor.author	Kim, Jaesik	-
dc.contributor.author	Park, Junekyu	-
dc.contributor.author	Sohn, Kyung Ah	-
dc.date.issued	2023-01-01	-
dc.identifier.uri	https://aurora.ajou.ac.kr/handle/2018.oak/37007	-
dc.identifier.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85159852257&origin=inward	-
dc.description.abstract	Unsupervised out-of-domain (OOD) detection is a task aimed at discriminating whether given samples are from the in-domain or not, without the categorical labels of in-domain instances. Unlike supervised OOD, as there are no labels for training a classifier, previous works on unsupervised OOD detection adopted the one-class classification (OCC) approach, assuming that the training samples come from a single domain. However, in-domain instances in many real world applications can have a heterogeneous distribution (i.e., across multiple domains or multiple classes). In this case, OCC methods have difficulty in reflecting the categorical information of the domain properly. To tackle this issue, we propose a two-stage framework that leverages the latent categorical information to improve representation learning for textual OOD detection. In the first stage, we train a transformer-based sentence encoder for pseudo labeling by contrastive loss and cluster loss. The second stage is pseudo label learning in which the model is re-trained with pseudo-labels obtained in the first stage. The empirical results on the three datasets show that our two-stage framework significantly outperforms baseline models in more challenging scenarios.	-
dc.description.sponsorship	This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT)(No. NRF-2022R1A2C1007434), and also by the Institute of Information and Communications Technology Planning and Evaluation (IITP) grant funded by the Korea Government (MSIT) (Artificial Intelligence Innovation Hub) under Grant 2021-0-02068.	-
dc.language.iso	eng	-
dc.publisher	Association for Computational Linguistics (ACL)	-
dc.subject.mesh	Classification approach	-
dc.subject.mesh	Domain detections	-
dc.subject.mesh	Heterogeneous distributions	-
dc.subject.mesh	Labelings	-
dc.subject.mesh	Multiple class	-
dc.subject.mesh	Multiple domains	-
dc.subject.mesh	One-class Classification	-
dc.subject.mesh	Real-world	-
dc.subject.mesh	Single domains	-
dc.subject.mesh	Training sample	-
dc.title	Improving Unsupervised Out-of-domain Detection through Pseudo Labeling and Learning	-
dc.type	Conference	-
dc.citation.conferenceDate	2023.05.02.~2023.05.06.	-
dc.citation.conferenceName	17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Findings of EACL 2023	-
dc.citation.edition	EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023	-
dc.citation.endPage	1011	-
dc.citation.startPage	1001	-
dc.citation.title	EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023	-
dc.identifier.bibliographicCitation	EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023, pp.1001-1011	-
dc.identifier.scopusid	2-s2.0-85159852257	-
dc.identifier.url	https://aclanthology.org/events/eacl-2023/#2023findings-eacl	-
dc.type.other	Conference Paper	-
dc.subject.subarea	Computational Theory and Mathematics	-
dc.subject.subarea	Software	-
dc.subject.subarea	Linguistics and Language	-

Show simple item record

qrcode

트윗하기

Related Researcher

Sohn, Kyung-Ah손경아: Department of Software and Computer Engineering

File Download

There are no files associated with this item.

Related Researcher

Total Views & Downloads

File Download