NOAH: Learning Pairwise Object Category Attentions for Image
Classification
- URL: http://arxiv.org/abs/2402.02377v1
- Date: Sun, 4 Feb 2024 07:19:40 GMT
- Title: NOAH: Learning Pairwise Object Category Attentions for Image
Classification
- Authors: Chao Li, Aojun Zhou, Anbang Yao
- Abstract summary: Non-glObal Attentive Head (NOAH) is a new form of dot-product attention called pairwise object category attention (POCA)
As a drop-in design, NOAH can be easily used to replace existing heads of various types of DNNs.
- Score: 26.077836657775403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A modern deep neural network (DNN) for image classification tasks typically
consists of two parts: a backbone for feature extraction, and a head for
feature encoding and class predication. We observe that the head structures of
mainstream DNNs adopt a similar feature encoding pipeline, exploiting global
feature dependencies while disregarding local ones. In this paper, we revisit
the feature encoding problem, and propose Non-glObal Attentive Head (NOAH) that
relies on a new form of dot-product attention called pairwise object category
attention (POCA), efficiently exploiting spatially dense category-specific
attentions to augment classification performance. NOAH introduces a neat
combination of feature split, transform and merge operations to learn POCAs at
local to global scales. As a drop-in design, NOAH can be easily used to replace
existing heads of various types of DNNs, improving classification performance
while maintaining similar model efficiency. We validate the effectiveness of
NOAH on ImageNet classification benchmark with 25 DNN architectures spanning
convolutional neural networks, vision transformers and multi-layer perceptrons.
In general, NOAH is able to significantly improve the performance of
lightweight DNNs, e.g., showing 3.14\%|5.3\%|1.9\% top-1 accuracy improvement
to MobileNetV2 (0.5x)|Deit-Tiny (0.5x)|gMLP-Tiny (0.5x). NOAH also generalizes
well when applied to medium-size and large-size DNNs. We further show that NOAH
retains its efficacy on other popular multi-class and multi-label image
classification benchmarks as well as in different training regimes, e.g.,
showing 3.6\%|1.1\% mAP improvement to large ResNet101|ViT-Large on MS-COCO
dataset. Project page: https://github.com/OSVAI/NOAH.
Related papers
- DCNN: Dual Cross-current Neural Networks Realized Using An Interactive Deep Learning Discriminator for Fine-grained Objects [48.65846477275723]
This study proposes novel dual-current neural networks (DCNN) to improve the accuracy of fine-grained image classification.
The main novel design features for constructing a weakly supervised learning backbone model DCNN include (a) extracting heterogeneous data, (b) keeping the feature map resolution unchanged, (c) expanding the receptive field, and (d) fusing global representations and local features.
arXiv Detail & Related papers (2024-05-07T07:51:28Z) - Deep Neural Network Models Trained With A Fixed Random Classifier
Transfer Better Across Domains [23.10912424714101]
Recently discovered Neural collapse (NC) phenomenon states that the last-layer weights of Deep Neural Networks converge to the so-called Equiangular Tight Frame (ETF) simplex, at the terminal phase of their training.
Inspired by NC properties, we explore in this paper the transferability of DNN models trained with their last layer weight fixed according to ETF.
arXiv Detail & Related papers (2024-02-28T15:52:30Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Rethinking Nearest Neighbors for Visual Classification [56.00783095670361]
k-NN is a lazy learning method that aggregates the distance between the test image and top-k neighbors in a training set.
We adopt k-NN with pre-trained visual representations produced by either supervised or self-supervised methods in two steps.
Via extensive experiments on a wide range of classification tasks, our study reveals the generality and flexibility of k-NN integration.
arXiv Detail & Related papers (2021-12-15T20:15:01Z) - Hybrid Graph Neural Networks for Few-Shot Learning [85.93495480949079]
Graph neural networks (GNNs) have been used to tackle the few-shot learning problem.
Under the inductive setting, existing GNN based methods are less competitive.
We propose a novel hybrid GNN model consisting of two GNNs, an instance GNN and a prototype GNN.
arXiv Detail & Related papers (2021-12-13T10:20:15Z) - From Stars to Subgraphs: Uplifting Any GNN with Local Structure
Awareness [23.279464786779787]
We introduce a general framework to uplift any MPNN to be more expressive.
Our framework is strictly more powerful than 1&2-WL, and is not less powerful than 3-WL.
Our method sets new state-of-the-art performance by large margins for several well-known graph ML tasks.
arXiv Detail & Related papers (2021-10-07T19:08:08Z) - Calibrating Class Activation Maps for Long-Tailed Visual Recognition [60.77124328049557]
We present two effective modifications of CNNs to improve network learning from long-tailed distribution.
First, we present a Class Activation Map (CAMC) module to improve the learning and prediction of network classifiers.
Second, we investigate the use of normalized classifiers for representation learning in long-tailed problems.
arXiv Detail & Related papers (2021-08-29T05:45:03Z) - Strengthening the Training of Convolutional Neural Networks By Using
Walsh Matrix [0.0]
We have modified the training and structure of DNN to increase the classification performance.
A minimum distance network (MDN) following the last layer of the convolutional neural network (CNN) is used as the classifier.
In different areas, it has been observed that a higher classification performance was obtained by using the DivFE with less number of nodes.
arXiv Detail & Related papers (2021-03-31T18:06:11Z) - Patch Based Classification of Remote Sensing Data: A Comparison of
2D-CNN, SVM and NN Classifiers [0.0]
We compare performance of patch based SVM and NN with that of a deep learning algorithms comprising of 2D-CNN and fully connected layers.
Results with both datasets suggest the effectiveness of patch based SVM and NN.
arXiv Detail & Related papers (2020-06-21T11:07:37Z) - Towards Deeper Graph Neural Networks with Differentiable Group
Normalization [61.20639338417576]
Graph neural networks (GNNs) learn the representation of a node by aggregating its neighbors.
Over-smoothing is one of the key issues which limit the performance of GNNs as the number of layers increases.
We introduce two over-smoothing metrics and a novel technique, i.e., differentiable group normalization (DGN)
arXiv Detail & Related papers (2020-06-12T07:18:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.