Analysis of Spatial augmentation in Self-supervised models in the purview of training and test distributions
- URL: http://arxiv.org/abs/2409.18228v1
- Date: Thu, 26 Sep 2024 19:18:36 GMT
- Title: Analysis of Spatial augmentation in Self-supervised models in the purview of training and test distributions
- Authors: Abhishek Jha, Tinne Tuytelaars,
- Abstract summary: We present an empirical study of typical spatial augmentation techniques used in self-supervised representation learning methods.
Our contributions are: (a) we dissociate random cropping into two separate augmentations, overlap and patch, and provide a detailed analysis on the effect of area of overlap and patch size to the accuracy on down stream tasks.
We offer an insight into why cutout augmentation does not learn good representation, as reported in earlier literature.
- Score: 38.77816582772029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present an empirical study of typical spatial augmentation techniques used in self-supervised representation learning methods (both contrastive and non-contrastive), namely random crop and cutout. Our contributions are: (a) we dissociate random cropping into two separate augmentations, overlap and patch, and provide a detailed analysis on the effect of area of overlap and patch size to the accuracy on down stream tasks. (b) We offer an insight into why cutout augmentation does not learn good representation, as reported in earlier literature. Finally, based on these analysis, (c) we propose a distance-based margin to the invariance loss for learning scene-centric representations for the downstream task on object-centric distribution, showing that as simple as a margin proportional to the pixel distance between the two spatial views in the scence-centric images can improve the learned representation. Our study furthers the understanding of the spatial augmentations, and the effect of the domain-gap between the training augmentations and the test distribution.
Related papers
- Measuring Representational Shifts in Continual Learning: A Linear Transformation Perspective [12.769918589649299]
In continual learning scenarios, catastrophic forgetting of previously learned tasks is a critical issue.<n>We provide the first theoretical analysis of representation forgetting and use this analysis to better understand the behavior of continual learning.
arXiv Detail & Related papers (2025-05-27T10:04:00Z) - Impact of Regularization on Calibration and Robustness: from the Representation Space Perspective [16.123727386404312]
Recent studies have shown that regularization techniques using soft labels enhance image classification accuracy and improve model calibration and robustness against adversarial attacks.
In this paper, we offer a novel explanation from the perspective of the representation space.
Our investigation first reveals that the decision regions in the representation space form cone-like shapes around the origin after training regardless of the presence of regularization.
arXiv Detail & Related papers (2024-10-05T02:09:03Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - From Patches to Objects: Exploiting Spatial Reasoning for Better Visual
Representations [2.363388546004777]
We propose a novel auxiliary pretraining method that is based on spatial reasoning.
Our proposed method takes advantage of a more flexible formulation of contrastive learning by introducing spatial reasoning as an auxiliary task for discriminative self-supervised methods.
arXiv Detail & Related papers (2023-05-21T07:46:46Z) - Learning to search for and detect objects in foveal images using deep
learning [3.655021726150368]
This study employs a fixation prediction model that emulates human objective-guided attention of searching for a given class in an image.
The foveated pictures at each fixation point are then classified to determine whether the target is present or absent in the scene.
We present a novel dual task model capable of performing fixation prediction and detection simultaneously, allowing knowledge transfer between the two tasks.
arXiv Detail & Related papers (2023-04-12T09:50:25Z) - Augmentation Invariance and Adaptive Sampling in Semantic Segmentation
of Agricultural Aerial Images [16.101248613062292]
We investigate the problem of Semantic for agricultural aerial imagery.
The existing methods used for this task are designed without considering two characteristics of the aerial data.
We propose a solution based on two ideas: (i) we use together a set of suitable augmentation and a consistency loss to guide the model to learn semantic representations that are invariant to the photometric and geometric shifts typical of the top-down perspective.
With an extensive set of experiments conducted on the Agriculture-Vision dataset, we demonstrate that our proposed strategies improve the performance of the current state-of-the-art method.
arXiv Detail & Related papers (2022-04-17T10:19:07Z) - Learning Where to Learn in Cross-View Self-Supervised Learning [54.14989750044489]
Self-supervised learning (SSL) has made enormous progress and largely narrowed the gap with supervised ones.
Current methods simply adopt uniform aggregation of pixels for embedding.
We present a new approach, Learning Where to Learn (LEWEL), to adaptively aggregate spatial information of features.
arXiv Detail & Related papers (2022-03-28T17:02:42Z) - Ranking Distance Calibration for Cross-Domain Few-Shot Learning [91.22458739205766]
Recent progress in few-shot learning promotes a more realistic cross-domain setting.
Due to the domain gap and disjoint label spaces between source and target datasets, their shared knowledge is extremely limited.
We employ a re-ranking process for calibrating a target distance matrix by discovering the reciprocal k-nearest neighbours within the task.
arXiv Detail & Related papers (2021-12-01T03:36:58Z) - PANet: Perspective-Aware Network with Dynamic Receptive Fields and
Self-Distilling Supervision for Crowd Counting [63.84828478688975]
We propose a novel perspective-aware approach called PANet to address the perspective problem.
Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework.
The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region.
arXiv Detail & Related papers (2021-10-31T04:43:05Z) - Object-aware Contrastive Learning for Debiased Scene Representation [74.30741492814327]
We develop a novel object-aware contrastive learning framework that localizes objects in a self-supervised manner.
We also introduce two data augmentations based on ContraCAM, object-aware random crop and background mixup, which reduce contextual and background biases during contrastive self-supervised learning.
arXiv Detail & Related papers (2021-07-30T19:24:07Z) - Align Yourself: Self-supervised Pre-training for Fine-grained
Recognition via Saliency Alignment [34.38172454910976]
Cross-view Saliency Alignment (CVSA) is a contrastive learning framework that first crops and swaps saliency regions of images as a novel view generation and then guides the model to localize on the foreground object via a cross-view alignment loss.
experiments on four popular fine-grained classification benchmarks show that CVSA significantly improves the learned representation.
arXiv Detail & Related papers (2021-06-30T02:56:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.