Mask guided attention for fine-grained patchy image classification
- URL: http://arxiv.org/abs/2102.02771v1
- Date: Thu, 4 Feb 2021 17:54:50 GMT
- Title: Mask guided attention for fine-grained patchy image classification
- Authors: Jun Wang, Xiaohan Yu, Yongsheng Gao
- Abstract summary: mask guided attention (MGA) method for fine-grained patchy image classification is presented.
We verify the effectiveness of our method on three publicly available patchy image datasets.
Our ablation study shows that MGA improves the accuracy by 2.25% and 2% on the SoyCultivarVein and BtfPIS datasets.
- Score: 22.91753200323264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present a novel mask guided attention (MGA) method for
fine-grained patchy image classification. The key challenge of fine-grained
patchy image classification lies in two folds, ultra-fine-grained
inter-category variances among objects and very few data available for
training. This motivates us to consider employing more useful supervision
signal to train a discriminative model within limited training samples.
Specifically, the proposed MGA integrates a pre-trained semantic segmentation
model that produces auxiliary supervision signal, i.e., patchy attention mask,
enabling a discriminative representation learning. The patchy attention mask
drives the classifier to filter out the insignificant parts of images (e.g.,
common features between different categories), which enhances the robustness of
MGA for the fine-grained patchy image classification. We verify the
effectiveness of our method on three publicly available patchy image datasets.
Experimental results demonstrate that our MGA method achieves superior
performance on three datasets compared with the state-of-the-art methods. In
addition, our ablation study shows that MGA improves the accuracy by 2.25% and
2% on the SoyCultivarVein and BtfPIS datasets, indicating its practicality
towards solving the fine-grained patchy image classification.
Related papers
- Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Learning to Mask and Permute Visual Tokens for Vision Transformer
Pre-Training [59.923672191632065]
We propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT)
MaPeT employs autoregressive and permuted predictions to capture intra-patch dependencies.
Our results demonstrate that MaPeT achieves competitive performance on ImageNet.
arXiv Detail & Related papers (2023-06-12T18:12:19Z) - High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation [17.804090651425955]
Image-level weakly-supervised segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training.
Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss.
We reformulate both techniques based on binomial posteriors of multiple independent binary problems.
This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method.
arXiv Detail & Related papers (2023-04-05T17:43:57Z) - HybridMIM: A Hybrid Masked Image Modeling Framework for 3D Medical Image
Segmentation [29.15746532186427]
HybridMIM is a novel hybrid self-supervised learning method based on masked image modeling for 3D medical image segmentation.
We learn the semantic information of medical images at three levels, including:1) partial region prediction to reconstruct key contents of the 3D image, which largely reduces the pre-training time burden.
The proposed framework is versatile to support both CNN and transformer as encoder backbones, and also enables to pre-train decoders for image segmentation.
arXiv Detail & Related papers (2023-03-18T04:43:12Z) - Data Augmentation Vision Transformer for Fine-grained Image
Classification [1.6211899643913996]
We propose a data augmentation vision transformer (DAVT) based on data augmentation.
We also propose a hierarchical attention selection (HAS) method, which improves the ability of discriminative markers between levels of learning.
Experimental results show that the accuracy of this method on the two general datasets, CUB-200-2011, and Stanford Dogs, is better than the existing mainstream methods.
arXiv Detail & Related papers (2022-11-23T11:34:11Z) - New wrapper method based on normalized mutual information for dimension
reduction and classification of hyperspectral images [0.0]
We propose a new wrapper method based on normalized mutual information (NMI) and error probability (PE)
Experiments have been performed on two challenging hyperspectral benchmarks datasets captured by the NASA's Airborne Visible/Infrared Imaging Spectrometer Sensor (AVIRIS)
arXiv Detail & Related papers (2022-10-25T21:17:11Z) - mc-BEiT: Multi-choice Discretization for Image BERT Pre-training [52.04866462439979]
Image BERT pre-training with masked image modeling (MIM) is a popular practice to cope with self-supervised representation learning.
We introduce an improved BERT-style image pre-training method, namely mc-BEiT, which performs MIM proxy tasks towards eased and refined multi-choice training objectives.
arXiv Detail & Related papers (2022-03-29T09:08:18Z) - Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and
Contrastive Meta-Learning [51.03781020616402]
Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications.
We propose a few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class.
Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions.
arXiv Detail & Related papers (2021-08-15T02:21:01Z) - Attention Model Enhanced Network for Classification of Breast Cancer
Image [54.83246945407568]
AMEN is formulated in a multi-branch fashion with pixel-wised attention model and classification submodular.
To focus more on subtle detail information, the sample image is enhanced by the pixel-wised attention map generated from former branch.
Experiments conducted on three benchmark datasets demonstrate the superiority of the proposed method under various scenarios.
arXiv Detail & Related papers (2020-10-07T08:44:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.