Related papers: Self-Supervised Learning from Non-Object Centric Images with a Geometric Transformation Sensitive Architecture

Self-Supervised Learning from Non-Object Centric Images with a Geometric Transformation Sensitive Architecture

URL: http://arxiv.org/abs/2304.08014v7
Date: Wed, 17 May 2023 01:50:25 GMT
Title: Self-Supervised Learning from Non-Object Centric Images with a Geometric Transformation Sensitive Architecture
Authors: Taeho Kim, Jong-Min Lee
Abstract summary: We propose a Geometric Transformation Sensitive Architecture to be sensitive to geometric transformations. Our method encourages the student to be sensitive by predicting rotation and using targets that vary with those transformations. Our approach demonstrates improved performance when using non-object-centric images as pretraining data.
Score: 7.825153552141346
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most invariance-based self-supervised methods rely on single object-centric images (e.g., ImageNet images) for pretraining, learning features that invariant to geometric transformation. However, when images are not object-centric, the semantics of the image can be significantly altered due to cropping. Furthermore, as the model becomes insensitive to geometric transformations, it may struggle to capture location information. For this reason, we propose a Geometric Transformation Sensitive Architecture designed to be sensitive to geometric transformations, specifically focusing on four-fold rotation, random crop, and multi-crop. Our method encourages the student to be sensitive by predicting rotation and using targets that vary with those transformations through pooling and rotating the teacher feature map. Additionally, we use patch correspondence loss to encourage correspondence between patches with similar features. This approach allows us to capture long-term dependencies in a more appropriate way than capturing long-term dependencies by encouraging local-to-global correspondence, which occurs when learning to be insensitive to multi-crop. Our approach demonstrates improved performance when using non-object-centric images as pretraining data compared to other methods that train the model to be insensitive to geometric transformation. We surpass DINO[Caron et al.[2021b]] baseline in tasks including image classification, semantic segmentation, detection, and instance segmentation with improvements of 4.9 $Top-1 Acc$, 3.3 $mIoU$, 3.4 $AP^b$, and 2.7 $AP^m$. Code and pretrained models are publicly available at: https://github.com/bok3948/GTSA

Related papers

Invariant Shape Representation Learning For Image Classification [41.610264291150706]
In this paper, we introduce a novel framework that for the first time develops invariant shape representation learning (ISRL) Our model ISRL is designed to jointly capture invariant features in latent shape spaces parameterized by deformable transformations. By embedding the features that are invariant with regard to target variables in different environments, our model consistently offers more accurate predictions.
arXiv Detail & Related papers (2024-11-19T03:39:43Z)
MGAug: Multimodal Geometric Augmentation in Latent Spaces of Image Deformations [2.2954246824369218]
We propose a novel model that generates augmenting transformations in a multimodal latent space of geometric deformations. Experimental results show that our proposed approach outperforms all baselines by significantly improved prediction accuracy.
arXiv Detail & Related papers (2023-12-20T21:30:55Z)
Variable Radiance Field for Real-Life Category-Specifc Reconstruction from Single Image [27.290232027686237]
We present a novel framework that can reconstruct category-specific objects from a single image without known camera parameters. We parameterize the geometry and appearance of the object using a multi-scale global feature extractor. We also propose a contrastive learning-based pretraining strategy to improve the feature extractor.
arXiv Detail & Related papers (2023-06-08T12:12:02Z)
Learning Transformations To Reduce the Geometric Shift in Object Detection [60.20931827772482]
We tackle geometric shifts emerging from variations in the image capture process. We introduce a self-training approach that learns a set of geometric transformations to minimize these shifts. We evaluate our method on two different shifts, i.e., a camera's field of view (FoV) change and a viewpoint change.
arXiv Detail & Related papers (2023-01-13T11:55:30Z)
RecRecNet: Rectangling Rectified Wide-Angle Images by Thin-Plate Spline Model and DoF-based Curriculum Learning [62.86400614141706]
We propose a new learning model, i.e., Rectangling Rectification Network (RecRecNet) Our model can flexibly warp the source structure to the target domain and achieves an end-to-end unsupervised deformation. Experiments show the superiority of our solution over the compared methods on both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2023-01-04T15:12:57Z)
Prediction of Geometric Transformation on Cardiac MRI via Convolutional Neural Network [13.01021780124613]
We propose to learn features in medical images by training ConvNets to recognize the geometric transformation applied to images. We present a simple self-supervised task that can easily predict the geometric transformation.
arXiv Detail & Related papers (2022-11-12T11:29:14Z)
Adapting the Mean Teacher for keypoint-based lung registration under geometric domain shifts [75.51482952586773]
deep neural networks generally require plenty of labeled training data and are vulnerable to domain shifts between training and test data. We present a novel approach to geometric domain adaptation for image registration, adapting a model from a labeled source to an unlabeled target domain. Our method consistently improves on the baseline model by 50%/47% while even matching the accuracy of models trained on target data.
arXiv Detail & Related papers (2022-07-01T12:16:42Z)
TransformNet: Self-supervised representation learning through predicting geometric transformations [0.8098097078441623]
We describe the unsupervised semantic feature learning approach for recognition of the geometric transformation applied to the input data. The basic concept of our approach is that if someone is unaware of the objects in the images, he/she would not be able to quantitatively predict the geometric transformation that was applied to them.
arXiv Detail & Related papers (2022-02-08T22:41:01Z)
DeepI2P: Image-to-Point Cloud Registration via Deep Classification [71.3121124994105]
DeepI2P is a novel approach for cross-modality registration between an image and a point cloud. Our method estimates the relative rigid transformation between the coordinate frames of the camera and Lidar. We circumvent the difficulty by converting the registration problem into a classification and inverse camera projection optimization problem.
arXiv Detail & Related papers (2021-04-08T04:27:32Z)
Self-supervised Geometric Perception [96.89966337518854]
Self-supervised geometric perception is a framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels. We show that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.
arXiv Detail & Related papers (2021-03-04T15:34:43Z)
Improving Few-shot Learning by Spatially-aware Matching and CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario. We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.