Ajou University repository

Improving Unsupervised Out-of-domain Detection through Pseudo Labeling and Learning
Citations

SCOPUS

0

Citation Export

DC Field Value Language
dc.contributor.authorLee, Byounghan-
dc.contributor.authorKim, Jaesik-
dc.contributor.authorPark, Junekyu-
dc.contributor.authorSohn, Kyung Ah-
dc.date.issued2023-01-01-
dc.identifier.urihttps://aurora.ajou.ac.kr/handle/2018.oak/37007-
dc.identifier.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85159852257&origin=inward-
dc.description.abstractUnsupervised out-of-domain (OOD) detection is a task aimed at discriminating whether given samples are from the in-domain or not, without the categorical labels of in-domain instances. Unlike supervised OOD, as there are no labels for training a classifier, previous works on unsupervised OOD detection adopted the one-class classification (OCC) approach, assuming that the training samples come from a single domain. However, in-domain instances in many real world applications can have a heterogeneous distribution (i.e., across multiple domains or multiple classes). In this case, OCC methods have difficulty in reflecting the categorical information of the domain properly. To tackle this issue, we propose a two-stage framework that leverages the latent categorical information to improve representation learning for textual OOD detection. In the first stage, we train a transformer-based sentence encoder for pseudo labeling by contrastive loss and cluster loss. The second stage is pseudo label learning in which the model is re-trained with pseudo-labels obtained in the first stage. The empirical results on the three datasets show that our two-stage framework significantly outperforms baseline models in more challenging scenarios.-
dc.description.sponsorshipThis work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT)(No. NRF-2022R1A2C1007434), and also by the Institute of Information and Communications Technology Planning and Evaluation (IITP) grant funded by the Korea Government (MSIT) (Artificial Intelligence Innovation Hub) under Grant 2021-0-02068.-
dc.language.isoeng-
dc.publisherAssociation for Computational Linguistics (ACL)-
dc.subject.meshClassification approach-
dc.subject.meshDomain detections-
dc.subject.meshHeterogeneous distributions-
dc.subject.meshLabelings-
dc.subject.meshMultiple class-
dc.subject.meshMultiple domains-
dc.subject.meshOne-class Classification-
dc.subject.meshReal-world-
dc.subject.meshSingle domains-
dc.subject.meshTraining sample-
dc.titleImproving Unsupervised Out-of-domain Detection through Pseudo Labeling and Learning-
dc.typeConference-
dc.citation.conferenceDate2023.5.2. ~ 2023.5.6.-
dc.citation.conferenceName17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Findings of EACL 2023-
dc.citation.editionEACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023-
dc.citation.endPage1011-
dc.citation.startPage1001-
dc.citation.titleEACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023-
dc.identifier.bibliographicCitationEACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023, pp.1001-1011-
dc.identifier.doi2-s2.0-85159852257-
dc.identifier.scopusid2-s2.0-85159852257-
dc.identifier.urlhttps://aclanthology.org/events/eacl-2023/#2023findings-eacl-
dc.type.otherConference Paper-
dc.subject.subareaComputational Theory and Mathematics-
dc.subject.subareaSoftware-
dc.subject.subareaLinguistics and Language-
Show simple item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Sohn, Kyung-Ah Image
Sohn, Kyung-Ah손경아
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.