A Lightweight Domain Adaptive Absolute Pose Regressor Using Barlow Twins
Objective
- URL: http://arxiv.org/abs/2211.10963v1
- Date: Sun, 20 Nov 2022 12:18:53 GMT
- Title: A Lightweight Domain Adaptive Absolute Pose Regressor Using Barlow Twins
Objective
- Authors: Praveen Kumar Rajendran, Quoc-Vinh Lai-Dang, Luiz Felipe Vecchietti,
Dongsoo Har
- Abstract summary: In this paper, a domain adaptive training framework for absolute pose regression is introduced.
In the proposed framework, the scene image is augmented for different domains by using generative methods to train parallel branches.
The results demonstrate that, even with using roughly 24 times fewer FLOPs, 12 times fewer activations, and 5 times fewer parameters than MS-Transformer, our approach outperforms all the CNN-based architectures.
- Score: 0.6193838300896449
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Identifying the camera pose for a given image is a challenging problem with
applications in robotics, autonomous vehicles, and augmented/virtual reality.
Lately, learning-based methods have shown to be effective for absolute camera
pose estimation. However, these methods are not accurate when generalizing to
different domains. In this paper, a domain adaptive training framework for
absolute pose regression is introduced. In the proposed framework, the scene
image is augmented for different domains by using generative methods to train
parallel branches using Barlow Twins objective. The parallel branches leverage
a lightweight CNN-based absolute pose regressor architecture. Further, the
efficacy of incorporating spatial and channel-wise attention in the regression
head for rotation prediction is investigated. Our method is evaluated with two
datasets, Cambridge landmarks and 7Scenes. The results demonstrate that, even
with using roughly 24 times fewer FLOPs, 12 times fewer activations, and 5
times fewer parameters than MS-Transformer, our approach outperforms all the
CNN-based architectures and achieves performance comparable to
transformer-based architectures. Our method ranks 2nd and 4th with the
Cambridge Landmarks and 7Scenes datasets, respectively. In addition, for
augmented domains not encountered during training, our approach significantly
outperforms the MS-transformer. Furthermore, it is shown that our domain
adaptive framework achieves better performance than the single branch model
trained with the identical CNN backbone with all instances of the unseen
distribution.
Related papers
- LeRF: Learning Resampling Function for Adaptive and Efficient Image Interpolation [64.34935748707673]
Recent deep neural networks (DNNs) have made impressive progress in performance by introducing learned data priors.
We propose a novel method of Learning Resampling (termed LeRF) which takes advantage of both the structural priors learned by DNNs and the locally continuous assumption.
LeRF assigns spatially varying resampling functions to input image pixels and learns to predict the shapes of these resampling functions with a neural network.
arXiv Detail & Related papers (2024-07-13T16:09:45Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Effective Invertible Arbitrary Image Rescaling [77.46732646918936]
Invertible Neural Networks (INN) are able to increase upscaling accuracy significantly by optimizing the downscaling and upscaling cycle jointly.
A simple and effective invertible arbitrary rescaling network (IARN) is proposed to achieve arbitrary image rescaling by training only one model in this work.
It is shown to achieve a state-of-the-art (SOTA) performance in bidirectional arbitrary rescaling without compromising perceptual quality in LR outputs.
arXiv Detail & Related papers (2022-09-26T22:22:30Z) - ImPosIng: Implicit Pose Encoding for Efficient Camera Pose Estimation [2.6808541153140077]
Implicit Pose.
(ImPosing) embeds images and camera poses into a common latent representation with 2 separate neural networks.
By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but refined.
arXiv Detail & Related papers (2022-05-05T13:33:25Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - OSLO: On-the-Sphere Learning for Omnidirectional images and its
application to 360-degree image compression [59.58879331876508]
We study the learning of representation models for omnidirectional images and propose to use the properties of HEALPix uniform sampling of the sphere to redefine the mathematical tools used in deep learning models for omnidirectional images.
Our proposed on-the-sphere solution leads to a better compression gain that can save 13.7% of the bit rate compared to similar learned models applied to equirectangular images.
arXiv Detail & Related papers (2021-07-19T22:14:30Z) - PixMatch: Unsupervised Domain Adaptation via Pixelwise Consistency
Training [4.336877104987131]
Unsupervised domain adaptation is a promising technique for semantic segmentation.
We present a novel framework for unsupervised domain adaptation based on the notion of target-domain consistency training.
Our approach is simpler, easier to implement, and more memory-efficient during training.
arXiv Detail & Related papers (2021-05-17T19:36:28Z) - Visual Camera Re-Localization Using Graph Neural Networks and Relative
Pose Supervision [31.947525258453584]
Visual re-localization means using a single image as input to estimate the camera's location and orientation relative to a pre-recorded environment.
Our proposed method makes few special assumptions, and is fairly lightweight in training and testing.
We validate the effectiveness of our approach on both standard indoor (7-Scenes) and outdoor (Cambridge Landmarks) camera re-localization benchmarks.
arXiv Detail & Related papers (2021-04-06T14:29:03Z) - iFAN: Image-Instance Full Alignment Networks for Adaptive Object
Detection [48.83883375118966]
iFAN aims to precisely align feature distributions on both image and instance levels.
It outperforms state-of-the-art methods with a boost of 10%+ AP over the source-only baseline.
arXiv Detail & Related papers (2020-03-09T13:27:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.