Ajou University repository

Channel Propagation Networks for Refreshable Vision Transformer
Citations

SCOPUS

0

Citation Export

Publication Year
2025-01-01
Journal
Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
Publisher
Institute of Electrical and Electronics Engineers Inc.
Citation
Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025, pp.1353-1362
Mesh Keyword
Channel dimensionChannel propagationClassification accuracyMultiple layersNew channelsPerformancePropagation methodSignal informationTransformer modelingVisual recognition
All Science Classification Codes (ASJC)
Artificial IntelligenceComputer Science ApplicationsComputer Vision and Pattern RecognitionHuman-Computer InteractionModeling and SimulationRadiology, Nuclear Medicine and Imaging
Abstract
In this paper, we introduce the Channel Propagation method, which aims to increase the channels of the Vision Transformer systematically. Skip connections are commonly acknowledged as a propagation approach that improves the stability of the performance in Vision Transformers. Nevertheless, it is important to note that these skip connections may give rise to the problem of over-smoothing, wherein similar features are represented in multiple layers. To tackle this matter, our proposed approach for Channel Propagation in Vision Transformers retains the present signal information while concurrently propagating location-specific signals in a newly introduced channel dimension. On the other hand, the proposed Channel Propagation method effectively maintains the integrity of identity representation while simultaneously including patch-wise location-specific supervision by introducing a new channel dimension. The inclusion of this approach in Vision Transformers mitigates the issue of over-smoothing while also improving the performance of visual recognition tasks. In our experiments, we confirm that the proposed method is effective for various visual recognition tasks. Specifically, our method demonstrates enhanced performance when implemented on Vision Transformer models; the classification accuracy is increased considerably for plain and hierarchical architectures on the ImageNet dataset.
Language
eng
URI
https://aurora.ajou.ac.kr/handle/2018.oak/38563
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105003636993&origin=inward
DOI
https://doi.org/10.1109/wacv61041.2025.00139
Journal URL
http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=10943266
Type
Conference Paper
Funding
This paper was supported in part by the Electronics and Telecommunications Research Institute (ETRI) Grant funded by Korean Government (Fundamental Technology Research for Human-Centric Autonomous Intelligent Systems) under Grant 24ZB1200, under the Artificial Intelligence Convergence Innovation Human Resources Development (IITP-2024-RS-2023-00255968), Artificial Intelligence Innovation Hub under Grant RS-2021-II212068, NRF from the Korea Government (MSIT) under Grant RS-2024-00356486.
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Ryu, Jongbin Image
Ryu, Jongbin유종빈
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.