Robust Self-Supervised Extrinsic Self-Calibration
- URL: http://arxiv.org/abs/2308.02153v2
- Date: Mon, 7 Aug 2023 02:33:21 GMT
- Title: Robust Self-Supervised Extrinsic Self-Calibration
- Authors: Takayuki Kanai, Igor Vasiljevic, Vitor Guizilini, Adrien Gaidon, and
Rares Ambrus
- Abstract summary: Multi-camera self-supervised monocular depth estimation from videos is a promising way to reason about the environment.
We introduce a novel method for extrinsic calibration that builds upon the principles of self-supervised monocular depth and ego-motion learning.
- Score: 25.727912226753247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous vehicles and robots need to operate over a wide variety of
scenarios in order to complete tasks efficiently and safely. Multi-camera
self-supervised monocular depth estimation from videos is a promising way to
reason about the environment, as it generates metrically scaled geometric
predictions from visual data without requiring additional sensors. However,
most works assume well-calibrated extrinsics to fully leverage this
multi-camera setup, even though accurate and efficient calibration is still a
challenging problem. In this work, we introduce a novel method for extrinsic
calibration that builds upon the principles of self-supervised monocular depth
and ego-motion learning. Our proposed curriculum learning strategy uses
monocular depth and pose estimators with velocity supervision to estimate
extrinsics, and then jointly learns extrinsic calibration along with depth and
pose for a set of overlapping cameras rigidly attached to a moving vehicle.
Experiments on a benchmark multi-camera dataset (DDAD) demonstrate that our
method enables self-calibration in various scenes robustly and efficiently
compared to a traditional vision-based pose estimation pipeline. Furthermore,
we demonstrate the benefits of extrinsics self-calibration as a way to improve
depth prediction via joint optimization.
Related papers
- VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques.
We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step.
Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - EGA-Depth: Efficient Guided Attention for Self-Supervised Multi-Camera
Depth Estimation [45.59727643007449]
We propose a novel guided attention architecture, EGA-Depth, which can improve the efficiency and accuracy of self-supervised multi-camera depth estimation.
For each camera, we use its perspective view as the query to cross-reference its neighboring views to derive informative features for this camera view.
arXiv Detail & Related papers (2023-04-06T20:50:28Z) - Policy Pre-training for End-to-end Autonomous Driving via
Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving.
We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos.
In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z) - Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular
Depth Estimation by Integrating IMU Motion Dynamics [74.1720528573331]
Unsupervised monocular depth and ego-motion estimation has drawn extensive research attention in recent years.
We propose DynaDepth, a novel scale-aware framework that integrates information from vision and IMU motion dynamics.
We validate the effectiveness of DynaDepth by conducting extensive experiments and simulations on the KITTI and Make3D datasets.
arXiv Detail & Related papers (2022-07-11T07:50:22Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Self-Supervised Camera Self-Calibration from Video [34.35533943247917]
We propose a learning algorithm to regress per-sequence calibration parameters using an efficient family of general camera models.
Our procedure achieves self-calibration results with sub-pixel reprojection error, outperforming other learning-based methods.
arXiv Detail & Related papers (2021-12-06T19:42:05Z) - Full Surround Monodepth from Multiple Cameras [31.145598985137468]
We extend self-supervised monocular depth and ego-motion estimation to large photo-baseline multi-camera rigs.
We learn a single network generating dense, consistent, and scale-aware point clouds that cover the same full surround 360 degree field of view as a typical LiDAR scanner.
arXiv Detail & Related papers (2021-03-31T22:52:04Z) - Infrastructure-based Multi-Camera Calibration using Radial Projections [117.22654577367246]
Pattern-based calibration techniques can be used to calibrate the intrinsics of the cameras individually.
Infrastucture-based calibration techniques are able to estimate the extrinsics using 3D maps pre-built via SLAM or Structure-from-Motion.
We propose to fully calibrate a multi-camera system from scratch using an infrastructure-based approach.
arXiv Detail & Related papers (2020-07-30T09:21:04Z) - DFVS: Deep Flow Guided Scene Agnostic Image Based Visual Servoing [11.000164408890635]
Existing deep learning based visual servoing approaches regress the relative camera pose between a pair of images.
We consider optical flow as our visual features, which are predicted using a deep neural network.
We show convergence for over 3m and 40 degrees while maintaining precise positioning of under 2cm and 1 degree.
arXiv Detail & Related papers (2020-03-08T11:42:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.