Endo-Depth-and-Motion: Localization and Reconstruction in Endoscopic
Videos using Depth Networks and Photometric Constraints
- URL: http://arxiv.org/abs/2103.16525v1
- Date: Tue, 30 Mar 2021 17:29:31 GMT
- Title: Endo-Depth-and-Motion: Localization and Reconstruction in Endoscopic
Videos using Depth Networks and Photometric Constraints
- Authors: David Recasens, Jos\'e Lamarca, Jos\'e M. F\'acil, J. M. M. Montiel,
Javier Civera
- Abstract summary: Estimating a scene reconstruction and the camera motion from in-body videos is challenging due to several factors.
We present Endo-Depth-and-Motion, a pipeline that estimates the 6-degrees-of-freedom camera pose and dense 3D scene models from monocular endoscopic videos.
- Score: 12.065803181395667
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating a scene reconstruction and the camera motion from in-body videos
is challenging due to several factors, e.g. the deformation of in-body cavities
or the lack of texture. In this paper we present Endo-Depth-and-Motion, a
pipeline that estimates the 6-degrees-of-freedom camera pose and dense 3D scene
models from monocular endoscopic videos. Our approach leverages recent advances
in self-supervised depth networks to generate pseudo-RGBD frames, then tracks
the camera pose using photometric residuals and fuses the registered depth maps
in a volumetric representation. We present an extensive experimental evaluation
in the public dataset Hamlyn, showing high-quality results and comparisons
against relevant baselines. We also release all models and code for future
comparisons.
Related papers
- Align3R: Aligned Monocular Depth Estimation for Dynamic Videos [50.28715151619659]
We propose a novel video-depth estimation method called Align3R to estimate temporal consistent depth maps for a dynamic video.
Our key idea is to utilize the recent DUSt3R model to align estimated monocular depth maps of different timesteps.
Experiments demonstrate that Align3R estimates consistent video depth and camera poses for a monocular video with superior performance than baseline methods.
arXiv Detail & Related papers (2024-12-04T07:09:59Z) - Video Depth without Video Models [34.11454612504574]
Video depth estimation lifts monocular video clips to 3D by inferring dense depth at every frame.
We show how to turn a single-image latent diffusion model (LDM) into a state-of-the-art video depth estimator.
Our model, which we call RollingDepth, has two main ingredients: (i) a multi-frame depth estimator that is derived from a single-image LDM and maps very short video snippets to depth snippets.
arXiv Detail & Related papers (2024-11-28T14:50:14Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular
Video Depth [90.33296913575818]
In some video-based scenarios such as video depth estimation and 3D scene reconstruction from a video, the unknown scale and shift residing in per-frame prediction may cause the depth inconsistency.
We propose a locally weighted linear regression method to recover the scale and shift with very sparse anchor points.
Our method can boost the performance of existing state-of-the-art approaches by 50% at most over several zero-shot benchmarks.
arXiv Detail & Related papers (2022-02-03T08:52:54Z) - Wide-angle Image Rectification: A Survey [86.36118799330802]
wide-angle images contain distortions that violate the assumptions underlying pinhole camera models.
Image rectification, which aims to correct these distortions, can solve these problems.
We present a detailed description and discussion of the camera models used in different approaches.
Next, we review both traditional geometry-based image rectification methods and deep learning-based methods.
arXiv Detail & Related papers (2020-10-30T17:28:40Z) - Self-Attention Dense Depth Estimation Network for Unrectified Video
Sequences [6.821598757786515]
LiDAR and radar sensors are the hardware solution for real-time depth estimation.
Deep learning based self-supervised depth estimation methods have shown promising results.
We propose a self-attention based depth and ego-motion network for unrectified images.
arXiv Detail & Related papers (2020-05-28T21:53:53Z) - Consistent Video Depth Estimation [57.712779457632024]
We present an algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video.
We leverage a conventional structure-from-motion reconstruction to establish geometric constraints on pixels in the video.
Our algorithm is able to handle challenging hand-held captured input videos with a moderate degree of dynamic motion.
arXiv Detail & Related papers (2020-04-30T17:59:26Z) - Video Depth Estimation by Fusing Flow-to-Depth Proposals [65.24533384679657]
We present an approach with a differentiable flow-to-depth layer for video depth estimation.
The model consists of a flow-to-depth layer, a camera pose refinement module, and a depth fusion network.
Our approach outperforms state-of-the-art depth estimation methods, and has reasonable cross dataset generalization capability.
arXiv Detail & Related papers (2019-12-30T10:45:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.