Ajou University repository

Enriching Local Patterns with Multi-Token Attention for Broad-Sight Neural Networks
Citations

SCOPUS

0

Citation Export

Publication Year
2025-01-01
Journal
Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
Publisher
Institute of Electrical and Electronics Engineers Inc.
Citation
Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025, pp.8270-8279
Keyword
bias-and-variance-errormulti-token attention poolingover-concentration
Mesh Keyword
Bias and varianceBias-and-variance-errorLearn+Local patternsMulti tokensMulti-token attention poolingNeural-networksOver-concentrationVariance errorVisual pattern
All Science Classification Codes (ASJC)
Artificial IntelligenceComputer Science ApplicationsComputer Vision and Pattern RecognitionHuman-Computer InteractionModeling and SimulationRadiology, Nuclear Medicine and Imaging
Abstract
In neural networks, recognizing visual patterns is challenging because global average pooling disregards local patterns and solely relies on over-concentrated activation. Global average pooling enforces the network to learn objects regardless of their location, so features tend to be activated only in specific regions. To support this claim, we provide a novel analysis of the problems that over-concentration brings about in networks with extensive experiments. We analyze the over-concentration through problems arising from feature variance and dead neurons that are not activated. Based on our analysis, we introduce a multi-token attention pooling layer to alleviate the over-concentration problem. Our attention-pooling layer captures broad-sight local patterns by learning multiple tokens with the proposed distillation algorithm. It resolves the high bias and high variance errors of learned multi-tokens, which is crucial when aggregating local patterns with multi-tokens. Our method applies to various vision tasks and network architectures such as CNN, ViT, and MLP-Mixer. The proposed method improves baselines with few extra resources, and a network employing our pooling method works favorably against state-of-the-art networks. We open-source the code at https://github.com/Lab-LVM/imagenet-models.
Language
eng
URI
https://aurora.ajou.ac.kr/handle/2018.oak/38562
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105003625128&origin=inward
DOI
https://doi.org/10.1109/wacv61041.2025.00802
Journal URL
http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=10943266
Type
Conference Paper
Funding
This paper was supported in part by the ETRI Grant funded by Korean Government (Fundamental Technology Research for Human-Centric Autonomous Intelligent Systems) under Grant 24ZB1200, Artificial Intelligence Innovation Hub (RS-2021-II212068), Artificial Intelligence Convergence Innovation Human Resources Development (IITP-2024-RS-2023-00255968), and the NRF Grant (RS-2024-00356486).
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Ryu, Jongbin Image
Ryu, Jongbin유종빈
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.