Leveraging background augmentations to encourage semantic focus in
self-supervised contrastive learning
- URL: http://arxiv.org/abs/2103.12719v1
- Date: Tue, 23 Mar 2021 17:39:16 GMT
- Title: Leveraging background augmentations to encourage semantic focus in
self-supervised contrastive learning
- Authors: Chaitanya K. Ryali, David J. Schwab, Ari S. Morcos
- Abstract summary: "Background augmentations" encourage models to focus on semantically-relevant content by discouraging them from focusing on image backgrounds.
Background augmentations lead to substantial improvements (+1-2% on ImageNet-1k) in performance across a spectrum of state-of-the art self-supervised methods.
- Score: 16.93045612956149
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised representation learning is an important challenge in computer
vision, with self-supervised learning methods recently closing the gap to
supervised representation learning. An important ingredient in high-performing
self-supervised methods is the use of data augmentation by training models to
place different augmented views of the same image nearby in embedding space.
However, commonly used augmentation pipelines treat images holistically,
disregarding the semantic relevance of parts of an image-e.g. a subject vs. a
background-which can lead to the learning of spurious correlations. Our work
addresses this problem by investigating a class of simple, yet highly effective
"background augmentations", which encourage models to focus on
semantically-relevant content by discouraging them from focusing on image
backgrounds. Background augmentations lead to substantial improvements (+1-2%
on ImageNet-1k) in performance across a spectrum of state-of-the art
self-supervised methods (MoCov2, BYOL, SwAV) on a variety of tasks, allowing us
to reach within 0.3% of supervised performance. We also demonstrate that
background augmentations improve robustness to a number of out of distribution
settings, including natural adversarial examples, the backgrounds challenge,
adversarial attacks, and ReaL ImageNet.
Related papers
- Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - Attention-Guided Masked Autoencoders For Learning Image Representations [16.257915216763692]
Masked autoencoders (MAEs) have established themselves as a powerful method for unsupervised pre-training for computer vision tasks.
We propose to inform the reconstruction process through an attention-guided loss function.
Our evaluations show that our pre-trained models learn better latent representations than the vanilla MAE.
arXiv Detail & Related papers (2024-02-23T08:11:25Z) - Heuristic Vision Pre-Training with Self-Supervised and Supervised
Multi-Task Learning [0.0]
We propose a novel pre-training framework by adopting both self-supervised and supervised visual pre-text tasks in a multi-task manner.
Results show that our pre-trained models can deliver results on par with or better than state-of-the-art (SOTA) results on multiple visual tasks.
arXiv Detail & Related papers (2023-10-11T14:06:04Z) - Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images
with Free Attention Masks [64.67735676127208]
Text-to-image diffusion models have shown great potential for benefiting image recognition.
Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images.
We introduce customized solutions by fully exploiting the aforementioned free attention masks.
arXiv Detail & Related papers (2023-08-13T10:07:46Z) - CoDo: Contrastive Learning with Downstream Background Invariance for
Detection [10.608660802917214]
We propose a novel object-level self-supervised learning method, called Contrastive learning with Downstream background invariance (CoDo)
The pretext task is converted to focus on instance location modeling for various backgrounds, especially for downstream datasets.
Experiments on MSCOCO demonstrate that the proposed CoDo with common backbones, ResNet50-FPN, yields strong transfer learning results for object detection.
arXiv Detail & Related papers (2022-05-10T01:26:15Z) - Semantic-Aware Generation for Self-Supervised Visual Representation
Learning [116.5814634936371]
We advocate for Semantic-aware Generation (SaGe) to facilitate richer semantics rather than details to be preserved in the generated image.
SaGe complements the target network with view-specific features and thus alleviates the semantic degradation brought by intensive data augmentations.
We execute SaGe on ImageNet-1K and evaluate the pre-trained models on five downstream tasks including nearest neighbor test, linear classification, and fine-scaled image recognition.
arXiv Detail & Related papers (2021-11-25T16:46:13Z) - Object-aware Contrastive Learning for Debiased Scene Representation [74.30741492814327]
We develop a novel object-aware contrastive learning framework that localizes objects in a self-supervised manner.
We also introduce two data augmentations based on ContraCAM, object-aware random crop and background mixup, which reduce contextual and background biases during contrastive self-supervised learning.
arXiv Detail & Related papers (2021-07-30T19:24:07Z) - Heterogeneous Contrastive Learning: Encoding Spatial Information for
Compact Visual Representations [183.03278932562438]
This paper presents an effective approach that adds spatial information to the encoding stage to alleviate the learning inconsistency between the contrastive objective and strong data augmentation operations.
We show that our approach achieves higher efficiency in visual representations and thus delivers a key message to inspire the future research of self-supervised visual representation learning.
arXiv Detail & Related papers (2020-11-19T16:26:25Z) - Hard Negative Mixing for Contrastive Learning [29.91220669060252]
We argue that an important aspect of contrastive learning, i.e., the effect of hard negatives, has so far been neglected.
We propose hard negative mixing strategies at the feature level, that can be computed on-the-fly with a minimal computational overhead.
arXiv Detail & Related papers (2020-10-02T14:34:58Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.