Learning Transformations To Reduce the Geometric Shift in Object
Detection
- URL: http://arxiv.org/abs/2301.05496v1
- Date: Fri, 13 Jan 2023 11:55:30 GMT
- Title: Learning Transformations To Reduce the Geometric Shift in Object
Detection
- Authors: Vidit Vidit, Martin Engilberge, Mathieu Salzmann
- Abstract summary: We tackle geometric shifts emerging from variations in the image capture process.
We introduce a self-training approach that learns a set of geometric transformations to minimize these shifts.
We evaluate our method on two different shifts, i.e., a camera's field of view (FoV) change and a viewpoint change.
- Score: 60.20931827772482
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The performance of modern object detectors drops when the test distribution
differs from the training one. Most of the methods that address this focus on
object appearance changes caused by, e.g., different illumination conditions,
or gaps between synthetic and real images. Here, by contrast, we tackle
geometric shifts emerging from variations in the image capture process, or due
to the constraints of the environment causing differences in the apparent
geometry of the content itself. We introduce a self-training approach that
learns a set of geometric transformations to minimize these shifts without
leveraging any labeled data in the new domain, nor any information about the
cameras. We evaluate our method on two different shifts, i.e., a camera's field
of view (FoV) change and a viewpoint change. Our results evidence that learning
geometric transformations helps detectors to perform better in the target
domains.
Related papers
- Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms [27.882122236282054]
We present a novel method for scene change detection that leverages the robust feature extraction capabilities of a visual foundational model, DINOv2.
We evaluate our approach on two benchmark datasets, VL-CMU-CD and PSCD, along with their viewpoint-varied versions.
Our experiments demonstrate significant improvements in F1-score, particularly in scenarios involving geometric changes between image pairs.
arXiv Detail & Related papers (2024-09-25T11:55:27Z) - Self-Supervised Learning from Non-Object Centric Images with a Geometric
Transformation Sensitive Architecture [7.825153552141346]
We propose a Geometric Transformation Sensitive Architecture to be sensitive to geometric transformations.
Our method encourages the student to be sensitive by predicting rotation and using targets that vary with those transformations.
Our approach demonstrates improved performance when using non-object-centric images as pretraining data.
arXiv Detail & Related papers (2023-04-17T06:32:37Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - RecRecNet: Rectangling Rectified Wide-Angle Images by Thin-Plate Spline
Model and DoF-based Curriculum Learning [62.86400614141706]
We propose a new learning model, i.e., Rectangling Rectification Network (RecRecNet)
Our model can flexibly warp the source structure to the target domain and achieves an end-to-end unsupervised deformation.
Experiments show the superiority of our solution over the compared methods on both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2023-01-04T15:12:57Z) - Self-Pair: Synthesizing Changes from Single Source for Object Change
Detection in Remote Sensing Imagery [6.586756080460231]
We train a change detector using two spatially unrelated images with corresponding semantic labels such as building.
We show that manipulating the source image as an after-image is crucial to the performance of change detection.
Our method outperforms existing methods based on single-temporal supervision.
arXiv Detail & Related papers (2022-12-20T13:26:42Z) - A Light Touch Approach to Teaching Transformers Multi-view Geometry [80.35521056416242]
We propose a "light touch" approach to guiding visual Transformers to learn multiple-view geometry.
We achieve this by using epipolar lines to guide the Transformer's cross-attention maps.
Unlike previous methods, our proposal does not require any camera pose information at test-time.
arXiv Detail & Related papers (2022-11-28T07:54:06Z) - The Change You Want to See [91.3755431537592]
Given two images of the same scene, being able to automatically detect the changes in them has practical applications in a variety of domains.
We tackle the change detection problem with the goal of detecting "object-level" changes in an image pair despite differences in their viewpoint and illumination.
arXiv Detail & Related papers (2022-09-28T18:10:09Z) - PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation [53.428312630479816]
We observe that the Field of View (FoV) gap induces noticeable instance appearance differences between the source and target domains.
Motivated by the observations, we propose the textbfPosition-Invariant Transform (PIT) to better align images in different domains.
arXiv Detail & Related papers (2021-08-16T15:16:47Z) - Image-to-image Transformation with Auxiliary Condition [0.0]
We propose to introduce the label information of subjects, e.g., pose and type of objects in the training of CycleGAN, and lead it to obtain label-wise transforamtion models.
We evaluate our proposed method called Label-CycleGAN, through experiments on the digit image transformation from SVHN to MNIST and the surveillance camera image transformation from simulated to real images.
arXiv Detail & Related papers (2021-06-25T15:33:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.