Self-Supervised Learning from Non-Object Centric Images with a Geometric
Transformation Sensitive Architecture
- URL: http://arxiv.org/abs/2304.08014v7
- Date: Wed, 17 May 2023 01:50:25 GMT
- Title: Self-Supervised Learning from Non-Object Centric Images with a Geometric
Transformation Sensitive Architecture
- Authors: Taeho Kim, Jong-Min Lee
- Abstract summary: We propose a Geometric Transformation Sensitive Architecture to be sensitive to geometric transformations.
Our method encourages the student to be sensitive by predicting rotation and using targets that vary with those transformations.
Our approach demonstrates improved performance when using non-object-centric images as pretraining data.
- Score: 7.825153552141346
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most invariance-based self-supervised methods rely on single object-centric
images (e.g., ImageNet images) for pretraining, learning features that
invariant to geometric transformation. However, when images are not
object-centric, the semantics of the image can be significantly altered due to
cropping. Furthermore, as the model becomes insensitive to geometric
transformations, it may struggle to capture location information. For this
reason, we propose a Geometric Transformation Sensitive Architecture designed
to be sensitive to geometric transformations, specifically focusing on
four-fold rotation, random crop, and multi-crop. Our method encourages the
student to be sensitive by predicting rotation and using targets that vary with
those transformations through pooling and rotating the teacher feature map.
Additionally, we use patch correspondence loss to encourage correspondence
between patches with similar features. This approach allows us to capture
long-term dependencies in a more appropriate way than capturing long-term
dependencies by encouraging local-to-global correspondence, which occurs when
learning to be insensitive to multi-crop. Our approach demonstrates improved
performance when using non-object-centric images as pretraining data compared
to other methods that train the model to be insensitive to geometric
transformation. We surpass DINO[Caron et al.[2021b]] baseline in tasks
including image classification, semantic segmentation, detection, and instance
segmentation with improvements of 4.9 $Top-1 Acc$, 3.3 $mIoU$, 3.4 $AP^b$, and
2.7 $AP^m$. Code and pretrained models are publicly available at:
https://github.com/bok3948/GTSA
Related papers
- MGAug: Multimodal Geometric Augmentation in Latent Spaces of Image
Deformations [2.711740183729759]
We propose a novel model that generates augmenting transformations in a multimodal latent space of geometric deformations.
Experimental results show that our proposed approach outperforms all baselines by significantly improved prediction accuracy.
arXiv Detail & Related papers (2023-12-20T21:30:55Z) - Variable Radiance Field for Real-Life Category-Specifc Reconstruction
from Single Image [27.290232027686237]
We present a novel framework that can reconstruct category-specific objects from a single image without known camera parameters.
We parameterize the geometry and appearance of the object using a multi-scale global feature extractor.
We also propose a contrastive learning-based pretraining strategy to improve the feature extractor.
arXiv Detail & Related papers (2023-06-08T12:12:02Z) - Learning Transformations To Reduce the Geometric Shift in Object
Detection [60.20931827772482]
We tackle geometric shifts emerging from variations in the image capture process.
We introduce a self-training approach that learns a set of geometric transformations to minimize these shifts.
We evaluate our method on two different shifts, i.e., a camera's field of view (FoV) change and a viewpoint change.
arXiv Detail & Related papers (2023-01-13T11:55:30Z) - RecRecNet: Rectangling Rectified Wide-Angle Images by Thin-Plate Spline
Model and DoF-based Curriculum Learning [62.86400614141706]
We propose a new learning model, i.e., Rectangling Rectification Network (RecRecNet)
Our model can flexibly warp the source structure to the target domain and achieves an end-to-end unsupervised deformation.
Experiments show the superiority of our solution over the compared methods on both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2023-01-04T15:12:57Z) - Prediction of Geometric Transformation on Cardiac MRI via Convolutional
Neural Network [13.01021780124613]
We propose to learn features in medical images by training ConvNets to recognize the geometric transformation applied to images.
We present a simple self-supervised task that can easily predict the geometric transformation.
arXiv Detail & Related papers (2022-11-12T11:29:14Z) - Adapting the Mean Teacher for keypoint-based lung registration under
geometric domain shifts [75.51482952586773]
deep neural networks generally require plenty of labeled training data and are vulnerable to domain shifts between training and test data.
We present a novel approach to geometric domain adaptation for image registration, adapting a model from a labeled source to an unlabeled target domain.
Our method consistently improves on the baseline model by 50%/47% while even matching the accuracy of models trained on target data.
arXiv Detail & Related papers (2022-07-01T12:16:42Z) - TransformNet: Self-supervised representation learning through predicting
geometric transformations [0.8098097078441623]
We describe the unsupervised semantic feature learning approach for recognition of the geometric transformation applied to the input data.
The basic concept of our approach is that if someone is unaware of the objects in the images, he/she would not be able to quantitatively predict the geometric transformation that was applied to them.
arXiv Detail & Related papers (2022-02-08T22:41:01Z) - DeepI2P: Image-to-Point Cloud Registration via Deep Classification [71.3121124994105]
DeepI2P is a novel approach for cross-modality registration between an image and a point cloud.
Our method estimates the relative rigid transformation between the coordinate frames of the camera and Lidar.
We circumvent the difficulty by converting the registration problem into a classification and inverse camera projection optimization problem.
arXiv Detail & Related papers (2021-04-08T04:27:32Z) - Self-supervised Geometric Perception [96.89966337518854]
Self-supervised geometric perception is a framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels.
We show that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.
arXiv Detail & Related papers (2021-03-04T15:34:43Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.