Fine-grained Recognition with Learnable Semantic Data Augmentation
- URL: http://arxiv.org/abs/2309.00399v1
- Date: Fri, 1 Sep 2023 11:15:50 GMT
- Title: Fine-grained Recognition with Learnable Semantic Data Augmentation
- Authors: Yifan Pu, Yizeng Han, Yulin Wang, Junlan Feng, Chao Deng, Gao Huang
- Abstract summary: Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
- Score: 68.48892326854494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-grained image recognition is a longstanding computer vision challenge
that focuses on differentiating objects belonging to multiple subordinate
categories within the same meta-category. Since images belonging to the same
meta-category usually share similar visual appearances, mining discriminative
visual cues is the key to distinguishing fine-grained categories. Although
commonly used image-level data augmentation techniques have achieved great
success in generic image classification problems, they are rarely applied in
fine-grained scenarios, because their random editing-region behavior is prone
to destroy the discriminative visual cues residing in the subtle regions. In
this paper, we propose diversifying the training data at the feature-level to
alleviate the discriminative region loss problem. Specifically, we produce
diversified augmented samples by translating image features along semantically
meaningful directions. The semantic directions are estimated with a covariance
prediction network, which predicts a sample-wise covariance matrix to adapt to
the large intra-class variation inherent in fine-grained images. Furthermore,
the covariance prediction network is jointly optimized with the classification
network in a meta-learning manner to alleviate the degenerate solution problem.
Experiments on four competitive fine-grained recognition benchmarks
(CUB-200-2011, Stanford Cars, FGVC Aircrafts, NABirds) demonstrate that our
method significantly improves the generalization performance on several popular
classification networks (e.g., ResNets, DenseNets, EfficientNets, RegNets and
ViT). Combined with a recently proposed method, our semantic data augmentation
approach achieves state-of-the-art performance on the CUB-200-2011 dataset. The
source code will be released.
Related papers
- Adaptive Face Recognition Using Adversarial Information Network [57.29464116557734]
Face recognition models often degenerate when training data are different from testing data.
We propose a novel adversarial information network (AIN) to address it.
arXiv Detail & Related papers (2023-05-23T02:14:11Z) - High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation [17.804090651425955]
Image-level weakly-supervised segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training.
Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss.
We reformulate both techniques based on binomial posteriors of multiple independent binary problems.
This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method.
arXiv Detail & Related papers (2023-04-05T17:43:57Z) - Learning Discriminative Representations for Multi-Label Image
Recognition [13.13795708478267]
We propose a unified deep network to learn discriminative features for the multi-label task.
By regularizing the whole network with the proposed loss, the performance of applying the wellknown ResNet-101 is improved significantly.
arXiv Detail & Related papers (2021-07-23T12:10:46Z) - Semantic Distribution-aware Contrastive Adaptation for Semantic
Segmentation [50.621269117524925]
Domain adaptive semantic segmentation refers to making predictions on a certain target domain with only annotations of a specific source domain.
We present a semantic distribution-aware contrastive adaptation algorithm that enables pixel-wise representation alignment.
We evaluate SDCA on multiple benchmarks, achieving considerable improvements over existing algorithms.
arXiv Detail & Related papers (2021-05-11T13:21:25Z) - Exploiting Invariance in Training Deep Neural Networks [4.169130102668252]
Inspired by two basic mechanisms in animal visual systems, we introduce a feature transform technique that imposes invariance properties in the training of deep neural networks.
The resulting algorithm requires less parameter tuning, trains well with an initial learning rate 1.0, and easily generalizes to different tasks.
Tested on ImageNet, MS COCO, and Cityscapes datasets, our proposed technique requires fewer iterations to train, surpasses all baselines by a large margin, seamlessly works on both small and large batch size training, and applies to different computer vision tasks of image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2021-03-30T19:18:31Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Context-aware Attentional Pooling (CAP) for Fine-grained Visual
Classification [2.963101656293054]
Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition.
We propose a novel context-aware attentional pooling (CAP) that effectively captures subtle changes via sub-pixel gradients.
We evaluate our approach using six state-of-the-art (SotA) backbone networks and eight benchmark datasets.
arXiv Detail & Related papers (2021-01-17T10:15:02Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.