Data Augmentation Vision Transformer for Fine-grained Image
Classification
- URL: http://arxiv.org/abs/2211.12879v2
- Date: Thu, 24 Nov 2022 08:40:56 GMT
- Title: Data Augmentation Vision Transformer for Fine-grained Image
Classification
- Authors: Chao Hu, Liqiang Zhu, Weibin Qiu, and Weijie Wu
- Abstract summary: We propose a data augmentation vision transformer (DAVT) based on data augmentation.
We also propose a hierarchical attention selection (HAS) method, which improves the ability of discriminative markers between levels of learning.
Experimental results show that the accuracy of this method on the two general datasets, CUB-200-2011, and Stanford Dogs, is better than the existing mainstream methods.
- Score: 1.6211899643913996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the vision transformer (ViT) has made breakthroughs in image
recognition. Its self-attention mechanism (MSA) can extract discriminative
labeling information of different pixel blocks to improve image classification
accuracy. However, the classification marks in their deep layers tend to ignore
local features between layers. In addition, the embedding layer will be
fixed-size pixel blocks. Input network Inevitably introduces additional image
noise. To this end, we study a data augmentation vision transformer (DAVT)
based on data augmentation and proposes a data augmentation method for
attention cropping, which uses attention weights as the guide to crop images
and improve the ability of the network to learn critical features. Secondly, we
also propose a hierarchical attention selection (HAS) method, which improves
the ability of discriminative markers between levels of learning by filtering
and fusing labels between levels. Experimental results show that the accuracy
of this method on the two general datasets, CUB-200-2011, and Stanford Dogs, is
better than the existing mainstream methods, and its accuracy is 1.4\% and
1.6\% higher than the original ViT, respectively
Related papers
- Nested-TNT: Hierarchical Vision Transformers with Multi-Scale Feature Processing [7.202931445597172]
Transformer has been applied in the field of computer vision due to its excellent performance in natural language processing.
In this paper, we introduce the nested algorithm and apply the Nested-TNT to image classification tasks.
The experiment confirms that the proposed model has achieved better classification performance over ViT and TNT, exceeding 2.25%, 1.1% on dataset CIFAR10 and 2.78%, 0.25% on dataset FLOWERS102 respectively.
arXiv Detail & Related papers (2024-04-20T17:56:14Z) - Additional Look into GAN-based Augmentation for Deep Learning COVID-19
Image Classification [57.1795052451257]
We study the dependence of the GAN-based augmentation performance on dataset size with a focus on small samples.
We train StyleGAN2-ADA with both sets and then, after validating the quality of generated images, we use trained GANs as one of the augmentations approaches in multi-class classification problems.
The GAN-based augmentation approach is found to be comparable with classical augmentation in the case of medium and large datasets but underperforms in the case of smaller datasets.
arXiv Detail & Related papers (2024-01-26T08:28:13Z) - A Survey of Graph and Attention Based Hyperspectral Image Classification
Methods for Remote Sensing Data [5.1901440366375855]
The use of Deep Learning techniques for classification in Hyperspectral Imaging (HSI) is rapidly growing.
Recent methods have also explored the usage of Graph Convolution Networks and their unique ability to use node features in prediction.
arXiv Detail & Related papers (2023-10-16T00:42:25Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Performance of GAN-based augmentation for deep learning COVID-19 image
classification [57.1795052451257]
The biggest challenge in the application of deep learning to the medical domain is the availability of training data.
Data augmentation is a typical methodology used in machine learning when confronted with a limited data set.
In this work, a StyleGAN2-ADA model of Generative Adversarial Networks is trained on the limited COVID-19 chest X-ray image set.
arXiv Detail & Related papers (2023-04-18T15:39:58Z) - TransHP: Image Classification with Hierarchical Prompting [27.049504972041834]
This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task.
We think it well imitates human visual recognition, i.e., humans may use the ancestor class as a prompt to draw focus on subtle differences among descendant classes.
arXiv Detail & Related papers (2023-04-13T10:37:41Z) - Weakly Supervised Change Detection Using Guided Anisotropic Difusion [97.43170678509478]
We propose original ideas that help us to leverage such datasets in the context of change detection.
First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results.
We then show its potential in two weakly-supervised learning strategies tailored for change detection.
arXiv Detail & Related papers (2021-12-31T10:03:47Z) - Weakly-supervised Generative Adversarial Networks for medical image
classification [1.479639149658596]
We propose a novel medical image classification algorithm called Weakly-Supervised Generative Adversarial Networks (WSGAN)
WSGAN only uses a small number of real images without labels to generate fake images or mask images to enlarge the sample size of the training set.
We show that WSGAN can obtain relatively high learning performance by using few labeled and unlabeled data.
arXiv Detail & Related papers (2021-11-29T15:38:48Z) - DSNet: A Dual-Stream Framework for Weakly-Supervised Gigapixel Pathology
Image Analysis [78.78181964748144]
We present a novel weakly-supervised framework for classifying whole slide images (WSIs)
WSIs are commonly processed by patch-wise classification with patch-level labels.
With image-level labels only, patch-wise classification would be sub-optimal due to inconsistency between the patch appearance and image-level label.
arXiv Detail & Related papers (2021-09-13T09:10:43Z) - Mask guided attention for fine-grained patchy image classification [22.91753200323264]
mask guided attention (MGA) method for fine-grained patchy image classification is presented.
We verify the effectiveness of our method on three publicly available patchy image datasets.
Our ablation study shows that MGA improves the accuracy by 2.25% and 2% on the SoyCultivarVein and BtfPIS datasets.
arXiv Detail & Related papers (2021-02-04T17:54:50Z) - Attention-Aware Noisy Label Learning for Image Classification [97.26664962498887]
Deep convolutional neural networks (CNNs) learned on large-scale labeled samples have achieved remarkable progress in computer vision.
The cheapest way to obtain a large body of labeled visual data is to crawl from websites with user-supplied labels, such as Flickr.
This paper proposes the attention-aware noisy label learning approach to improve the discriminative capability of the network trained on datasets with potential label noise.
arXiv Detail & Related papers (2020-09-30T15:45:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.