Ajou University repository

ETFT: Equiangular Tight Frame Transformer for Imbalanced Semantic Segmentationoa mark
Citations

SCOPUS

0

Citation Export

Publication Year
2024-11-01
Publisher
Multidisciplinary Digital Publishing Institute (MDPI)
Citation
Sensors, Vol.24
Keyword
class imbalanceneural collapsesemantic segmentationtransformer
Mesh Keyword
Attention mechanismsClass imbalanceDiscriminabilityEquiangular tight framesFrame structureInput imageNeural collapsePropertySemantic segmentationTransformer
All Science Classification Codes (ASJC)
Analytical ChemistryInformation SystemsAtomic and Molecular Physics, and OpticsBiochemistryInstrumentationElectrical and Electronic Engineering
Abstract
Semantic segmentation often suffers from class imbalance, where the label ratio for each class in the dataset is not uniform. Recent studies have addressed the issue of class imbalance in semantic segmentation by leveraging the neural collapse phenomenon in conjunction with an Equiangular Tight Frame (ETF). While the use of ETF aids in enhancing the discriminability of minor classes, class correlation is another crucial factor that must be taken into account. However, managing the balance between class correlation and discrimination through neural collapse remains challenging, as these properties inherently conflict with one another. Moreover, this control is established during the training stage, resulting in a fixed classifier. There is no guarantee that this classifier will consistently perform well with different input images. To address this problem, we propose an Equiangular Tight Frame Transformer (ETFT), a transformer-based model that jointly processes the features and classifier using ETF structure, and dynamically generates the classifier as a function of the input for imbalanced semantic segmentation. Specifically, the classifier initialized with the ETF structure is jointly processed with the input patch tokens during the attention process. As a result, the transformed patch tokens, aided by the ETF structure, achieve discriminability between classes while preserving contextual correlation. The classifier, initially structured as an ETF, is adjusted to incorporate the correlation information, benefiting from the attention mechanism. Furthermore, the learned classifier is combined with the fixed ETF classifier, leveraging the advantages of both. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art methods for imbalanced semantic segmentation on both the ADE20K and Cityscapes datasets.
ISSN
1424-8220
Language
eng
URI
https://dspace.ajou.ac.kr/dev/handle/2018.oak/34593
DOI
https://doi.org/10.3390/s24216913
Fulltext

Type
Article
Funding
This work was supported in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education under Grant 2022R1F1A1065702; and in part by the Institute of Information & communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development (IITP-2024-RS-2023-00255968) grant funded by the Korea government (MSIT).
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Heo,Yong Seok  Image
Heo,Yong Seok 허용석
Department of Electrical and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.