Leveraging Synthetic Priors for Monocular Depth Estimation in Specular Surgical Environments
- URL: http://arxiv.org/abs/2512.23786v1
- Date: Mon, 29 Dec 2025 17:29:42 GMT
- Title: Leveraging Synthetic Priors for Monocular Depth Estimation in Specular Surgical Environments
- Authors: Ankan Aich, Yangming Lee,
- Abstract summary: Existing self-supervised methods often suffer from boundary collapse on thin surgical tools and transparent surfaces.<n>In this work, we address this by leveraging the high-fidelity synthetic priors of the Depth Anything V2 architecture.<n>Our approach establishes a new state-of-the-art, achieving an accuracy ( 1.25) of 98.1% and reducing Squared Relative Error by over 17% compared to established baselines.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate Monocular Depth Estimation (MDE) is critical for robotic surgery but remains fragile in specular, fluid-filled endoscopic environments. Existing self-supervised methods, typically relying on foundation models trained with noisy real-world pseudo-labels, often suffer from boundary collapse on thin surgical tools and transparent surfaces. In this work, we address this by leveraging the high-fidelity synthetic priors of the Depth Anything V2 architecture, which inherently captures precise geometric details of thin structures. We efficiently adapt these priors to the medical domain using Dynamic Vector Low-Rank Adaptation (DV-LORA), minimizing the parameter budget while bridging the synthetic-to-real gap. Additionally, we introduce a physically-stratified evaluation protocol on the SCARED dataset to rigorously quantify performance in high-specularity regimes often masked by aggregate metrics. Our approach establishes a new state-of-the-art, achieving an accuracy (< 1.25) of 98.1% and reducing Squared Relative Error by over 17% compared to established baselines, demonstrating superior robustness in adverse surgical lighting.
Related papers
- Suppressing Prior-Comparison Hallucinations in Radiology Report Generation via Semantically Decoupled Latent Steering [94.37535002230504]
We develop a training-free, inference-time control framework termed Semantically Decoupled Latent Steering.<n>Our approach constructs a semantic-free intervention vector via large language model (LLM)-driven semantic decomposition.<n>We show that our approach significantly reduces the probability of historical hallucinations.
arXiv Detail & Related papers (2026-02-27T04:49:01Z) - Real-time topology-aware M-mode OCT segmentation for robotic deep anterior lamellar keratoplasty (DALK) guidance [4.803245501695445]
We present a lightweight, topology aware M-mode segmentation pipeline based on UNeXt.<n>The proposed system achieves end to end throughput exceeding 80 Hz measured over the complete preprocessing inference overlay pipeline.
arXiv Detail & Related papers (2026-02-02T20:58:04Z) - SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL [26.10211846938172]
General-purpose Large Vision-Language Models (LVLMs) often falter in dermatology due to "diffuse attention"<n>We introduce SkinFlow, a framework that treats diagnosis as an optimization of visual information transmission efficiency.
arXiv Detail & Related papers (2026-01-14T04:21:07Z) - EndoUFM: Utilizing Foundation Models for Monocular depth estimation of endoscopic images [7.350425834778092]
EndoUFM is an unsupervised monocular depth estimation framework.<n>It enhances the depth estimation performance by leveraging the powerful pre-learned priors.<n>This work contributes to augmenting surgeons' spatial perception during minimally invasive procedures.
arXiv Detail & Related papers (2025-08-25T11:33:05Z) - Generate Aligned Anomaly: Region-Guided Few-Shot Anomaly Image-Mask Pair Synthesis for Industrial Inspection [53.137651284042434]
Anomaly inspection plays a vital role in industrial manufacturing, but the scarcity of anomaly samples limits the effectiveness of existing methods.<n>We propose Generate grained Anomaly (GAA), a region-guided, few-shot anomaly image-mask pair generation framework.<n>GAA generates realistic, diverse, and semantically aligned anomalies using only a small number of samples.
arXiv Detail & Related papers (2025-07-13T12:56:59Z) - Parameterized Diffusion Optimization enabled Autoregressive Ordinal Regression for Diabetic Retinopathy Grading [53.11883409422728]
This work proposes a novel autoregressive ordinal regression method called AOR-DR.<n>We decompose the diabetic retinopathy grading task into a series of ordered steps by fusing the prediction of the previous steps with extracted image features.<n>We exploit the diffusion process to facilitate conditional probability modeling, enabling the direct use of continuous global image features for autoregression.
arXiv Detail & Related papers (2025-07-07T13:22:35Z) - Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection [58.87142367781417]
A naively trained detector tends to favor overfitting to the limited and monotonous fake patterns, causing the feature space to become highly constrained and low-ranked.<n>One potential remedy is incorporating the pre-trained knowledge within the vision foundation models to expand the feature space.<n>By freezing the principal components and adapting only the remained components, we preserve the pre-trained knowledge while learning fake patterns.
arXiv Detail & Related papers (2024-11-23T19:10:32Z) - Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy [2.906891207990726]
We introduce a novel fine-tuning strategy for the Depth Anything Model.<n>We integrate it with an intrinsic-based unsupervised monocular depth estimation framework.<n>Our results show that our method achieves state-of-the-art performance while minimizing the number of trainable parameters.
arXiv Detail & Related papers (2024-09-12T03:04:43Z) - DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model [17.41557655783514]
We introduce Depth Anything in Robotic Endoscopic Surgery (DARES)
New adaptation technique, Low-Rank Adaptation (LoRA) on the DAM V2 to perform self-supervised monocular depth estimation.
New method is validated superior over recent state-of-the-art self-supervised monocular depth estimation techniques.
arXiv Detail & Related papers (2024-08-30T17:35:06Z) - Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - Safe Deep RL for Intraoperative Planning of Pedicle Screw Placement [61.28459114068828]
We propose an intraoperative planning approach for robotic spine surgery that leverages real-time observation for drill path planning based on Safe Deep Reinforcement Learning (DRL)
Our approach was capable of achieving 90% bone penetration with respect to the gold standard (GS) drill planning.
arXiv Detail & Related papers (2023-05-09T11:42:53Z) - Negligible effect of brain MRI data preprocessing for tumor segmentation [36.89606202543839]
We conduct experiments on three publicly available datasets and evaluate the effect of different preprocessing steps in deep neural networks.
Our results demonstrate that most popular standardization steps add no value to the network performance.
We suggest that image intensity normalization approaches do not contribute to model accuracy because of the reduction of signal variance with image standardization.
arXiv Detail & Related papers (2022-04-11T17:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.