MetaFE-DE: Learning Meta Feature Embedding for Depth Estimation from Monocular Endoscopic Images
- URL: http://arxiv.org/abs/2502.03493v1
- Date: Wed, 05 Feb 2025 02:52:30 GMT
- Title: MetaFE-DE: Learning Meta Feature Embedding for Depth Estimation from Monocular Endoscopic Images
- Authors: Dawei Lu, Deqiang Xiao, Danni Ai, Jingfan Fan, Tianyu Fu, Yucong Lin, Hong Song, Xujiong Ye, Lei Zhang, Jian Yang,
- Abstract summary: Existing methods primarily estimate the depth information from RGB images directly.
We introduce a novel concept referred as meta feature embedding (MetaFE)"
In this paper, we propose a two-stage self-supervised learning paradigm for the monocular endoscopic depth estimation.
- Score: 18.023231290573268
- License:
- Abstract: Depth estimation from monocular endoscopic images presents significant challenges due to the complexity of endoscopic surgery, such as irregular shapes of human soft tissues, as well as variations in lighting conditions. Existing methods primarily estimate the depth information from RGB images directly, and often surffer the limited interpretability and accuracy. Given that RGB and depth images are two views of the same endoscopic surgery scene, in this paper, we introduce a novel concept referred as ``meta feature embedding (MetaFE)", in which the physical entities (e.g., tissues and surgical instruments) of endoscopic surgery are represented using the shared features that can be alternatively decoded into RGB or depth image. With this concept, we propose a two-stage self-supervised learning paradigm for the monocular endoscopic depth estimation. In the first stage, we propose a temporal representation learner using diffusion models, which are aligned with the spatial information through the cross normalization to construct the MetaFE. In the second stage, self-supervised monocular depth estimation with the brightness calibration is applied to decode the meta features into the depth image. Extensive evaluation on diverse endoscopic datasets demonstrates that our approach outperforms the state-of-the-art method in depth estimation, achieving superior accuracy and generalization. The source code will be publicly available.
Related papers
- Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion [51.69876947593144]
Existing methods for depth completion operate in tightly constrained settings.
Inspired by advances in monocular depth estimation, we reframe depth completion as an image-conditional depth map generation.
Marigold-DC builds on a pretrained latent diffusion model for monocular depth estimation and injects the depth observations as test-time guidance.
arXiv Detail & Related papers (2024-12-18T00:06:41Z) - EndoDepth: A Benchmark for Assessing Robustness in Endoscopic Depth Prediction [1.7243216387069678]
We present the EndoDepth benchmark, an evaluation framework designed to assess the robustness of monocular depth prediction models in endoscopic scenarios.
We present an evaluation approach that is consistent and specifically designed to evaluate the robustness performance of the model in endoscopic scenarios.
arXiv Detail & Related papers (2024-09-30T04:18:14Z) - Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy [3.1186464715409983]
We introduce a novel fine-tuning strategy for the Depth Anything Model.
We integrate it with an intrinsic-based unsupervised monocular depth estimation framework.
Our results on the SCARED dataset show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-09-12T03:04:43Z) - ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth.
Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module.
Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z) - Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos [12.497782583094281]
Monocular depth estimation in endoscopy videos can enable assistive and robotic surgery to obtain better coverage of the organ and detection of various health issues.
Despite promising progress on mainstream, natural image depth estimation, techniques perform poorly on endoscopy images.
In this paper, we utilize the photometric cues, i.e., the light emitted from an endoscope and reflected by the surface, to improve monocular depth estimation.
arXiv Detail & Related papers (2024-03-26T17:52:23Z) - K-Space-Aware Cross-Modality Score for Synthesized Neuroimage Quality
Assessment [71.27193056354741]
The problem of how to assess cross-modality medical image synthesis has been largely unexplored.
We propose a new metric K-CROSS to spur progress on this challenging problem.
K-CROSS uses a pre-trained multi-modality segmentation network to predict the lesion location.
arXiv Detail & Related papers (2023-07-10T01:26:48Z) - A Novel Hybrid Endoscopic Dataset for Evaluating Machine Learning-based
Photometric Image Enhancement Models [0.9236074230806579]
This work introduces a new synthetically generated data-set generated by a generative adversarial techniques.
It also explores both shallow based and deep learning-based image-enhancement methods in overexposed and underexposed lighting conditions.
arXiv Detail & Related papers (2022-07-06T01:47:17Z) - S2R-DepthNet: Learning a Generalizable Depth-specific Structural
Representation [63.58891781246175]
Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes.
We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information.
Our S2R-DepthNet can be well generalized to unseen real-world data directly even though it is only trained on synthetic data.
arXiv Detail & Related papers (2021-04-02T03:55:41Z) - Calibrating Self-supervised Monocular Depth Estimation [77.77696851397539]
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal.
We show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
arXiv Detail & Related papers (2020-09-16T14:35:45Z) - Single-shot Hyperspectral-Depth Imaging with Learned Diffractive Optics [72.9038524082252]
We propose a compact single-shot monocular hyperspectral-depth (HS-D) imaging method.
Our method uses a diffractive optical element (DOE), the point spread function of which changes with respect to both depth and spectrum.
To facilitate learning the DOE, we present a first HS-D dataset by building a benchtop HS-D imager.
arXiv Detail & Related papers (2020-09-01T14:19:35Z) - Monocular Retinal Depth Estimation and Joint Optic Disc and Cup
Segmentation using Adversarial Networks [18.188041599999547]
We propose a novel method using adversarial network to predict depth map from a single image.
We obtain a very high average correlation coefficient of 0.92 upon five fold cross validation.
We then use the depth estimation process as a proxy task for joint optic disc and cup segmentation.
arXiv Detail & Related papers (2020-07-15T06:21:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.