Adversarial Feature Augmentation and Normalization for Visual
Recognition
- URL: http://arxiv.org/abs/2103.12171v1
- Date: Mon, 22 Mar 2021 20:36:34 GMT
- Title: Adversarial Feature Augmentation and Normalization for Visual
Recognition
- Authors: Tianlong Chen, Yu Cheng, Zhe Gan, Jianfeng Wang, Lijuan Wang,
Zhangyang Wang, Jingjing Liu
- Abstract summary: Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
- Score: 109.6834687220478
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in computer vision take advantage of adversarial data
augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates
adversarial augmentation on intermediate feature embeddings, instead of relying
on computationally-expensive pixel-level perturbations. We propose Adversarial
Feature Augmentation and Normalization (A-FAN), which (i) first augments visual
recognition models with adversarial features that integrate flexible scales of
perturbation strengths, (ii) then extracts adversarial feature statistics from
batch normalization, and re-injects them into clean features through feature
normalization. We validate the proposed approach across diverse visual
recognition tasks with representative backbone networks, including ResNets and
EfficientNets for classification, Faster-RCNN for detection, and Deeplab V3+
for segmentation. Extensive experiments show that A-FAN yields consistent
generalization improvement over strong baselines across various datasets for
classification, detection and segmentation tasks, such as CIFAR-10, CIFAR-100,
ImageNet, Pascal VOC2007, Pascal VOC2012, COCO2017, and Cityspaces.
Comprehensive ablation studies and detailed analyses also demonstrate that
adding perturbations to specific modules and layers of
classification/detection/segmentation backbones yields optimal performance.
Codes and pre-trained models will be made available at:
https://github.com/VITA-Group/CV_A-FAN.
Related papers
- CAVE: Classifying Abnormalities in Video Capsule Endoscopy [0.1937002985471497]
In this study, we explore an ensemble-based approach to improve classification accuracy in complex image datasets.
We leverage the unique feature-extraction capabilities of each model to enhance the overall accuracy.
Experimental evaluations demonstrate that the ensemble achieves higher accuracy and robustness across challenging and imbalanced classes.
arXiv Detail & Related papers (2024-10-26T17:25:08Z) - Enhancing Fine-Grained Visual Recognition in the Low-Data Regime Through Feature Magnitude Regularization [23.78498670529746]
We introduce a regularization technique to ensure that the magnitudes of the extracted features are evenly distributed.
Despite its apparent simplicity, our approach has demonstrated significant performance improvements across various fine-grained visual recognition datasets.
arXiv Detail & Related papers (2024-09-03T07:32:46Z) - Neural Clustering based Visual Representation Learning [61.72646814537163]
Clustering is one of the most classic approaches in machine learning and data analysis.
We propose feature extraction with clustering (FEC), which views feature extraction as a process of selecting representatives from data.
FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives.
arXiv Detail & Related papers (2024-03-26T06:04:50Z) - Unveiling Backbone Effects in CLIP: Exploring Representational Synergies
and Variances [49.631908848868505]
Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning.
We investigate the differences in CLIP performance among various neural architectures.
We propose a simple, yet effective approach to combine predictions from multiple backbones, leading to a notable performance boost of up to 6.34%.
arXiv Detail & Related papers (2023-12-22T03:01:41Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Regularization Through Simultaneous Learning: A Case Study on Plant
Classification [0.0]
This paper introduces Simultaneous Learning, a regularization approach drawing on principles of Transfer Learning and Multi-task Learning.
We leverage auxiliary datasets with the target dataset, the UFOP-HVD, to facilitate simultaneous classification guided by a customized loss function.
Remarkably, our approach demonstrates superior performance over models without regularization.
arXiv Detail & Related papers (2023-05-22T19:44:57Z) - Self-Supervised Hypergraph Transformer for Recommender Systems [25.07482350586435]
Self-Supervised Hypergraph Transformer (SHT)
Self-Supervised Hypergraph Transformer (SHT)
Cross-view generative self-supervised learning component is proposed for data augmentation over the user-item interaction graph.
arXiv Detail & Related papers (2022-07-28T18:40:30Z) - Calibrating Class Activation Maps for Long-Tailed Visual Recognition [60.77124328049557]
We present two effective modifications of CNNs to improve network learning from long-tailed distribution.
First, we present a Class Activation Map (CAMC) module to improve the learning and prediction of network classifiers.
Second, we investigate the use of normalized classifiers for representation learning in long-tailed problems.
arXiv Detail & Related papers (2021-08-29T05:45:03Z) - Boosting the Generalization Capability in Cross-Domain Few-shot Learning
via Noise-enhanced Supervised Autoencoder [23.860842627883187]
We teach the model to capture broader variations of the feature distributions with a novel noise-enhanced supervised autoencoder (NSAE)
NSAE trains the model by jointly reconstructing inputs and predicting the labels of inputs as well as their reconstructed pairs.
We also take advantage of NSAE structure and propose a two-step fine-tuning procedure that achieves better adaption and improves classification performance in the target domain.
arXiv Detail & Related papers (2021-08-11T04:45:56Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.