Gramian Attention Heads are Strong yet Efficient Vision Learners
- URL: http://arxiv.org/abs/2310.16483v1
- Date: Wed, 25 Oct 2023 09:08:58 GMT
- Title: Gramian Attention Heads are Strong yet Efficient Vision Learners
- Authors: Jongbin Ryu, Dongyoon Han, Jongwoo Lim
- Abstract summary: We introduce a novel architecture design that enhances expressiveness by incorporating multiple head classifiers (ie, classification heads)
Our approach employs attention-based aggregation, utilizing pairwise feature similarity to enhance multiple lightweight heads with minimal resource overhead.
Our models eventually surpass state-of-the-art CNNs and ViTs regarding the accuracy-grained trade-off on ImageNet-1K.
- Score: 26.79263390835444
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a novel architecture design that enhances expressiveness by
incorporating multiple head classifiers (\ie, classification heads) instead of
relying on channel expansion or additional building blocks. Our approach
employs attention-based aggregation, utilizing pairwise feature similarity to
enhance multiple lightweight heads with minimal resource overhead. We compute
the Gramian matrices to reinforce class tokens in an attention layer for each
head. This enables the heads to learn more discriminative representations,
enhancing their aggregation capabilities. Furthermore, we propose a learning
algorithm that encourages heads to complement each other by reducing
correlation for aggregation. Our models eventually surpass state-of-the-art
CNNs and ViTs regarding the accuracy-throughput trade-off on ImageNet-1K and
deliver remarkable performance across various downstream tasks, such as COCO
object instance segmentation, ADE20k semantic segmentation, and fine-grained
visual classification datasets. The effectiveness of our framework is
substantiated by practical experimental results and further underpinned by
generalization error bound. We release the code publicly at:
https://github.com/Lab-LVM/imagenet-models.
Related papers
- Discriminative Anchor Learning for Efficient Multi-view Clustering [59.11406089896875]
We propose discriminative anchor learning for multi-view clustering (DALMC)
We learn discriminative view-specific feature representations according to the original dataset.
We build anchors from different views based on these representations, which increase the quality of the shared anchor graph.
arXiv Detail & Related papers (2024-09-25T13:11:17Z) - Convolutional Fine-Grained Classification with Self-Supervised Target
Relation Regularization [34.8793946023412]
This paper introduces a novel target coding scheme -- dynamic target relation graphs (DTRG)
Online computation of class-level feature centers is designed to generate cross-category distance in the representation space.
The proposed target graphs can alleviate data sparsity and imbalanceness in representation learning.
arXiv Detail & Related papers (2022-08-03T11:51:53Z) - CAD: Co-Adapting Discriminative Features for Improved Few-Shot
Classification [11.894289991529496]
Few-shot classification is a challenging problem that aims to learn a model that can adapt to unseen classes given a few labeled samples.
Recent approaches pre-train a feature extractor, and then fine-tune for episodic meta-learning.
We propose a strategy to cross-attend and re-weight discriminative features for few-shot classification.
arXiv Detail & Related papers (2022-03-25T06:14:51Z) - GraphCoCo: Graph Complementary Contrastive Learning [65.89743197355722]
Graph Contrastive Learning (GCL) has shown promising performance in graph representation learning (GRL) without the supervision of manual annotations.
This paper proposes an effective graph complementary contrastive learning approach named GraphCoCo to tackle the above issue.
arXiv Detail & Related papers (2022-03-24T02:58:36Z) - Clustering by Maximizing Mutual Information Across Views [62.21716612888669]
We propose a novel framework for image clustering that incorporates joint representation learning and clustering.
Our method significantly outperforms state-of-the-art single-stage clustering methods across a variety of image datasets.
arXiv Detail & Related papers (2021-07-24T15:36:49Z) - Revisiting Contrastive Methods for Unsupervised Learning of Visual
Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
In this paper, we first study how biases in the dataset affect existing methods.
We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - GATCluster: Self-Supervised Gaussian-Attention Network for Image
Clustering [9.722607434532883]
We propose a self-supervised clustering network for image Clustering (GATCluster)
Rather than extracting intermediate features first and then performing the traditional clustering, GATCluster semantic cluster labels without further post-processing.
We develop a two-step learning algorithm that is memory-efficient for clustering large-size images.
arXiv Detail & Related papers (2020-02-27T00:57:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.