Self-Supervised Monocular Depth and Ego-Motion Estimation in Endoscopy:
Appearance Flow to the Rescue
- URL: http://arxiv.org/abs/2112.08122v1
- Date: Wed, 15 Dec 2021 13:51:10 GMT
- Title: Self-Supervised Monocular Depth and Ego-Motion Estimation in Endoscopy:
Appearance Flow to the Rescue
- Authors: Shuwei Shao, Zhongcai Pei, Weihai Chen, Wentao Zhu, Xingming Wu,
Dianmin Sun, Baochang Zhang
- Abstract summary: Self-supervised learning technology has been applied to calculate depth and ego-motion from monocular videos.
In this work, we introduce a novel concept referred to as appearance flow to address the brightness inconsistency problem.
We build a unified self-supervised framework to estimate monocular depth and ego-motion simultaneously in endoscopic scenes.
- Score: 38.168759071532676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, self-supervised learning technology has been applied to calculate
depth and ego-motion from monocular videos, achieving remarkable performance in
autonomous driving scenarios. One widely adopted assumption of depth and
ego-motion self-supervised learning is that the image brightness remains
constant within nearby frames. Unfortunately, the endoscopic scene does not
meet this assumption because there are severe brightness fluctuations induced
by illumination variations, non-Lambertian reflections and interreflections
during data collection, and these brightness fluctuations inevitably
deteriorate the depth and ego-motion estimation accuracy. In this work, we
introduce a novel concept referred to as appearance flow to address the
brightness inconsistency problem. The appearance flow takes into consideration
any variations in the brightness pattern and enables us to develop a
generalized dynamic image constraint. Furthermore, we build a unified
self-supervised framework to estimate monocular depth and ego-motion
simultaneously in endoscopic scenes, which comprises a structure module, a
motion module, an appearance module and a correspondence module, to accurately
reconstruct the appearance and calibrate the image brightness. Extensive
experiments are conducted on the SCARED dataset and EndoSLAM dataset, and the
proposed unified framework exceeds other self-supervised approaches by a large
margin. To validate our framework's generalization ability on different
patients and cameras, we train our model on SCARED but test it on the SERV-CT
and Hamlyn datasets without any fine-tuning, and the superior results reveal
its strong generalization ability. Code will be available at:
\url{https://github.com/ShuweiShao/AF-SfMLearner}.
Related papers
- Dynamic Brightness Adaptation for Robust Multi-modal Image Fusion [53.72174230024836]
Visible imaging in real-world scenarios is susceptible to dynamic environmental brightness fluctuations, leading to texture degradation.
We propose the Brightness Adaptive multimodal dynamic fusion framework (BA-Fusion), which achieves robust image fusion despite dynamic brightness fluctuations.
Our method surpasses state-of-the-art methods in preserving multi-modal image information and visual fidelity, while exhibiting remarkable robustness across varying brightness levels.
arXiv Detail & Related papers (2024-11-07T13:31:07Z) - SelfOdom: Self-supervised Egomotion and Depth Learning via
Bi-directional Coarse-to-Fine Scale Recovery [12.791122117651273]
SelfOdom is a self-supervised dual-network framework for learning pose and depth estimates from monocular images.
We introduce a novel coarse-to-fine training strategy that enables the metric scale to be recovered in a two-stage process.
Our model excels in both normal and challenging lighting conditions, including difficult night scenes.
arXiv Detail & Related papers (2022-11-16T13:36:19Z) - Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular
Depth Estimation by Integrating IMU Motion Dynamics [74.1720528573331]
Unsupervised monocular depth and ego-motion estimation has drawn extensive research attention in recent years.
We propose DynaDepth, a novel scale-aware framework that integrates information from vision and IMU motion dynamics.
We validate the effectiveness of DynaDepth by conducting extensive experiments and simulations on the KITTI and Make3D datasets.
arXiv Detail & Related papers (2022-07-11T07:50:22Z) - SMUDLP: Self-Teaching Multi-Frame Unsupervised Endoscopic Depth
Estimation with Learnable Patchmatch [25.35009126980672]
Unsupervised monocular trained depth estimation models make use of adjacent frames as a supervisory signal during the training phase.
temporally correlated frames are also available at inference time for many clinical applications, e.g., surgical navigation.
We present SMUDLP, a novel and unsupervised paradigm for multi-frame monocular endoscopic depth estimation.
arXiv Detail & Related papers (2022-05-30T12:11:03Z) - Toward Fast, Flexible, and Robust Low-Light Image Enhancement [87.27326390675155]
We develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios.
Considering the computational burden of the cascaded pattern, we construct the self-calibrated module which realizes the convergence between results of each stage.
We make comprehensive explorations to SCI's inherent properties including operation-insensitive adaptability and model-irrelevant generality.
arXiv Detail & Related papers (2022-04-21T14:40:32Z) - Self-supervised Visual-LiDAR Odometry with Flip Consistency [7.883162238852467]
Self-supervised visual-lidar odometry (Self-VLO) framework is proposed.
It takes both monocular images and sparse depth maps projected from 3D lidar points as input.
It produces pose and depth estimations in an end-to-end learning manner.
arXiv Detail & Related papers (2021-01-05T02:42:59Z) - SIR: Self-supervised Image Rectification via Seeing the Same Scene from
Multiple Different Lenses [82.56853587380168]
We propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of the same scene from different lens should be the same.
We leverage a differentiable warping module to generate the rectified images and re-distorted images from the distortion parameters.
Our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods.
arXiv Detail & Related papers (2020-11-30T08:23:25Z) - Calibrating Self-supervised Monocular Depth Estimation [77.77696851397539]
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal.
We show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
arXiv Detail & Related papers (2020-09-16T14:35:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.