DFVS: Deep Flow Guided Scene Agnostic Image Based Visual Servoing
- URL: http://arxiv.org/abs/2003.03766v1
- Date: Sun, 8 Mar 2020 11:42:36 GMT
- Title: DFVS: Deep Flow Guided Scene Agnostic Image Based Visual Servoing
- Authors: Y V S Harish, Harit Pandya, Ayush Gaud, Shreya Terupally, Sai Shankar
and K. Madhava Krishna
- Abstract summary: Existing deep learning based visual servoing approaches regress the relative camera pose between a pair of images.
We consider optical flow as our visual features, which are predicted using a deep neural network.
We show convergence for over 3m and 40 degrees while maintaining precise positioning of under 2cm and 1 degree.
- Score: 11.000164408890635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing deep learning based visual servoing approaches regress the relative
camera pose between a pair of images. Therefore, they require a huge amount of
training data and sometimes fine-tuning for adaptation to a novel scene.
Furthermore, current approaches do not consider underlying geometry of the
scene and rely on direct estimation of camera pose. Thus, inaccuracies in
prediction of the camera pose, especially for distant goals, lead to a
degradation in the servoing performance. In this paper, we propose a two-fold
solution: (i) We consider optical flow as our visual features, which are
predicted using a deep neural network. (ii) These flow features are then
systematically integrated with depth estimates provided by another neural
network using interaction matrix. We further present an extensive benchmark in
a photo-realistic 3D simulation across diverse scenes to study the convergence
and generalisation of visual servoing approaches. We show convergence for over
3m and 40 degrees while maintaining precise positioning of under 2cm and 1
degree on our challenging benchmark where the existing approaches that are
unable to converge for majority of scenarios for over 1.5m and 20 degrees.
Furthermore, we also evaluate our approach for a real scenario on an aerial
robot. Our approach generalizes to novel scenarios producing precise and robust
servoing performance for 6 degrees of freedom positioning tasks with even large
camera transformations without any retraining or fine-tuning.
Related papers
- FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera [8.502741852406904]
We present FisheyeDepth, a self-supervised depth estimation model tailored for fisheye cameras.
We incorporate a fisheye camera model into the projection and reprojection stages during training to handle image distortions.
We also incorporate real-scale pose information into the geometric projection between consecutive frames, replacing the poses estimated by the conventional pose network.
arXiv Detail & Related papers (2024-09-23T14:31:42Z) - SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning [17.99904937160487]
We introduce SCIPaD, a novel approach that incorporates spatial clues for unsupervised depth-pose joint learning.
SCIPaD achieves a reduction of 22.2% in average translation error and 34.8% in average angular error for camera pose estimation task on the KITTI Odometry dataset.
arXiv Detail & Related papers (2024-07-07T06:52:51Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - Calibrating Panoramic Depth Estimation for Practical Localization and
Mapping [20.621442016969976]
The absolute depth values of surrounding environments provide crucial cues for various assistive technologies, such as localization, navigation, and 3D structure estimation.
We propose that accurate depth estimated from panoramic images can serve as a powerful and light-weight input for a wide range of downstream tasks requiring 3D information.
arXiv Detail & Related papers (2023-08-27T04:50:05Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Shakes on a Plane: Unsupervised Depth Estimation from Unstabilized
Photography [54.36608424943729]
We show that in a ''long-burst'', forty-two 12-megapixel RAW frames captured in a two-second sequence, there is enough parallax information from natural hand tremor alone to recover high-quality scene depth.
We devise a test-time optimization approach that fits a neural RGB-D representation to long-burst data and simultaneously estimates scene depth and camera motion.
arXiv Detail & Related papers (2022-12-22T18:54:34Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.