Gramian Attention Heads are Strong yet Efficient Vision Learners

Ryu, Jongbin; Han, Dongyoon; Lim, Jongwoo

DC Field	Value	Language
dc.contributor.author	Ryu, Jongbin	-
dc.contributor.author	Han, Dongyoon	-
dc.contributor.author	Lim, Jongwoo	-
dc.date.issued	2023-01-01	-
dc.identifier.issn	1550-5499	-
dc.identifier.uri	https://aurora.ajou.ac.kr/handle/2018.oak/36949	-
dc.identifier.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85188239686&origin=inward	-
dc.description.abstract	We introduce a novel architecture design that enhances expressiveness by incorporating multiple head classifiers (i.e., classification heads) instead of relying on channel expansion or additional building blocks. Our approach employs attention-based aggregation, utilizing pairwise feature similarity to enhance multiple lightweight heads with minimal resource overhead. We compute the Gramian matrices to reinforce class tokens in an attention layer for each head. This enables the heads to learn more discriminative representations, enhancing their aggregation capabilities. Furthermore, we propose a learning algorithm that encourages heads to complement each other by reducing correlation for aggregation. Our models eventually surpass state-of-the-art CNNs and ViTs regarding the accuracy-throughput trade-off on ImageNet-1K and deliver remarkable performance across various downstream tasks, such as COCO object instance segmentation, ADE20k semantic segmentation, and fine-grained visual classification datasets. The effectiveness of our framework is substantiated by practical experimental results and further underpinned by generalization error bound. We release the code publicly at: https://github.com/Lab-LVM/imagenet-models.	-
dc.language.iso	eng	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.subject.mesh	Architecture designs	-
dc.subject.mesh	Building blockes	-
dc.subject.mesh	Channel expansions	-
dc.subject.mesh	Gramians	-
dc.subject.mesh	Learn+	-
dc.subject.mesh	matrix	-
dc.subject.mesh	Novel architecture	-
dc.subject.mesh	Performance	-
dc.subject.mesh	State of the art	-
dc.subject.mesh	Trade off	-
dc.title	Gramian Attention Heads are Strong yet Efficient Vision Learners	-
dc.type	Conference	-
dc.citation.conferenceDate	2023.10.2. ~ 2023.10.6.	-
dc.citation.conferenceName	2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023	-
dc.citation.edition	Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023	-
dc.citation.endPage	5828	-
dc.citation.startPage	5818	-
dc.citation.title	Proceedings of the IEEE International Conference on Computer Vision	-
dc.identifier.bibliographicCitation	Proceedings of the IEEE International Conference on Computer Vision, pp.5818-5828	-
dc.identifier.doi	10.1109/iccv51070.2023.00537	-
dc.identifier.scopusid	2-s2.0-85188239686	-
dc.identifier.url	http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000149	-
dc.type.other	Conference Paper	-
dc.description.isoa	true	-
dc.subject.subarea	Software	-
dc.subject.subarea	Computer Vision and Pattern Recognition	-

Show simple item record

qrcode

트윗하기

Related Researcher

Ryu, Jongbin유종빈: Department of Software and Computer Engineering

File Download

There are no files associated with this item.

Related Researcher

Total Views & Downloads

File Download