Shape-Texture Debiased Neural Network Training
- URL: http://arxiv.org/abs/2010.05981v2
- Date: Tue, 30 Mar 2021 19:16:30 GMT
- Title: Shape-Texture Debiased Neural Network Training
- Authors: Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen,
Alan Yuille, Cihang Xie
- Abstract summary: Convolutional Neural Networks are often biased towards either texture or shape, depending on the training dataset.
We develop an algorithm for shape-texture debiased learning.
Experiments show that our method successfully improves model performance on several image recognition benchmarks.
- Score: 50.6178024087048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Shape and texture are two prominent and complementary cues for recognizing
objects. Nonetheless, Convolutional Neural Networks are often biased towards
either texture or shape, depending on the training dataset. Our ablation shows
that such bias degenerates model performance. Motivated by this observation, we
develop a simple algorithm for shape-texture debiased learning. To prevent
models from exclusively attending on a single cue in representation learning,
we augment training data with images with conflicting shape and texture
information (eg, an image of chimpanzee shape but with lemon texture) and, most
importantly, provide the corresponding supervisions from shape and texture
simultaneously.
Experiments show that our method successfully improves model performance on
several image recognition benchmarks and adversarial robustness. For example,
by training on ImageNet, it helps ResNet-152 achieve substantial improvements
on ImageNet (+1.2%), ImageNet-A (+5.2%), ImageNet-C (+8.3%) and
Stylized-ImageNet (+11.1%), and on defending against FGSM adversarial attacker
on ImageNet (+14.4%). Our method also claims to be compatible with other
advanced data augmentation strategies, eg, Mixup, and CutMix. The code is
available here: https://github.com/LiYingwei/ShapeTextureDebiasedTraining.
Related papers
- CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data [40.88256210436378]
We present a novel weakly supervised pre-training of vision models on web-scale image-text data.
The proposed method reframes pre-training on image-text data as a classification task.
It achieves a remarkable $2.7times$ acceleration in training speed compared to contrastive learning on web-scale data.
arXiv Detail & Related papers (2024-04-24T05:13:28Z) - ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object [78.58860252442045]
We introduce generative model as a data source for hard images that benchmark deep models' robustness.
We are able to generate images with more diversified backgrounds, textures, and materials than any prior work, where we term this benchmark as ImageNet-D.
Our work suggests that diffusion models can be an effective source to test vision models.
arXiv Detail & Related papers (2024-03-27T17:23:39Z) - EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - Decoupled Mixup for Generalized Visual Recognition [71.13734761715472]
We propose a novel "Decoupled-Mixup" method to train CNN models for visual recognition.
Our method decouples each image into discriminative and noise-prone regions, and then heterogeneously combines these regions to train CNN models.
Experiment results show the high generalization performance of our method on testing data that are composed of unseen contexts.
arXiv Detail & Related papers (2022-10-26T15:21:39Z) - Adaptive Convolutional Dictionary Network for CT Metal Artifact
Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction.
Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image.
Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z) - Identical Image Retrieval using Deep Learning [0.0]
We are using the BigTransfer Model, which is a state-of-art model itself.
We extract the key features and train on the K-Nearest Neighbor model to obtain the nearest neighbor.
The application of our model is to find similar images, which are hard to achieve through text queries within a low inference time.
arXiv Detail & Related papers (2022-05-10T13:34:41Z) - Robust Contrastive Learning Using Negative Samples with Diminished
Semantics [23.38896719740166]
We show that by generating carefully designed negative samples, contrastive learning can learn more robust representations.
We develop two methods, texture-based and patch-based augmentations, to generate negative samples.
We also analyze our method and the generated texture-based samples, showing that texture features are indispensable in classifying particular ImageNet classes.
arXiv Detail & Related papers (2021-10-27T05:38:00Z) - Automated Cleanup of the ImageNet Dataset by Model Consensus,
Explainability and Confident Learning [0.0]
ImageNet was the backbone of various convolutional neural networks (CNNs) trained on ILSVRC12Net.
This paper describes automated applications based on model consensus, explainability and confident learning to correct labeling mistakes.
The ImageNet-Clean improves the model performance by 2-2.4 % for SqueezeNet and EfficientNet-B0 models.
arXiv Detail & Related papers (2021-03-30T13:16:35Z) - Increasing the Robustness of Semantic Segmentation Models with
Painting-by-Numbers [39.95214171175713]
We build upon an insight from image classification that output can be improved by increasing the network-bias towards object shapes.
Our basic idea is to alpha-blend a portion of the RGB training images with faked images, where each class-label is given a fixed, randomly chosen color.
We demonstrate the effectiveness of our training schema for DeepLabv3+ with various network backbones, MobileNet-V2, ResNets, and Xception, and evaluate it on the Cityscapes dataset.
arXiv Detail & Related papers (2020-10-12T07:42:39Z) - Informative Dropout for Robust Representation Learning: A Shape-bias
Perspective [84.30946377024297]
We propose a light-weight model-agnostic method, namely Informative Dropout (InfoDrop), to improve interpretability and reduce texture bias.
Specifically, we discriminate texture from shape based on local self-information in an image, and adopt a Dropout-like algorithm to decorrelate the model output from the local texture.
arXiv Detail & Related papers (2020-08-10T16:52:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.