DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model
- URL: http://arxiv.org/abs/2408.17433v2
- Date: Mon, 21 Oct 2024 11:01:46 GMT
- Title: DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model
- Authors: Mona Sheikh Zeinoddin, Chiara Lena, Jiongqi Qu, Luca Carlini, Mattia Magro, Seunghoi Kim, Elena De Momi, Sophia Bano, Matthew Grech-Sollars, Evangelos Mazomenos, Daniel C. Alexander, Danail Stoyanov, Matthew J. Clarkson, Mobarakol Islam,
- Abstract summary: We introduce Depth Anything in Robotic Endoscopic Surgery (DARES)
New adaptation technique, Low-Rank Adaptation (LoRA) on the DAM V2 to perform self-supervised monocular depth estimation.
New method is validated superior over recent state-of-the-art self-supervised monocular depth estimation techniques.
- Score: 17.41557655783514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robotic-assisted surgery (RAS) relies on accurate depth estimation for 3D reconstruction and visualization. While foundation models like Depth Anything Models (DAM) show promise, directly applying them to surgery often yields suboptimal results. Fully fine-tuning on limited surgical data can cause overfitting and catastrophic forgetting, compromising model robustness and generalization. Although Low-Rank Adaptation (LoRA) addresses some adaptation issues, its uniform parameter distribution neglects the inherent feature hierarchy, where earlier layers, learning more general features, require more parameters than later ones. To tackle this issue, we introduce Depth Anything in Robotic Endoscopic Surgery (DARES), a novel approach that employs a new adaptation technique, Vector Low-Rank Adaptation (Vector-LoRA) on the DAM V2 to perform self-supervised monocular depth estimation in RAS scenes. To enhance learning efficiency, we introduce Vector-LoRA by integrating more parameters in earlier layers and gradually decreasing parameters in later layers. We also design a reprojection loss based on the multi-scale SSIM error to enhance depth perception by better tailoring the foundation model to the specific requirements of the surgical environment. The proposed method is validated on the SCARED dataset and demonstrates superior performance over recent state-of-the-art self-supervised monocular depth estimation techniques, achieving an improvement of 13.3% in the absolute relative error metric. The code and pre-trained weights are available at https://github.com/mobarakol/DARES.
Related papers
- Surgical Depth Anything: Depth Estimation for Surgical Scenes using Foundation Models [4.740415113160021]
Current state-of-the-art foundational model for depth estimation, Depth Anything, struggles with issues such as blurring, bleeding, and reflections.
This paper presents a fine-tuning of the Depth Anything model specifically for the surgical domain, aiming to deliver more accurate pixel-wise depth maps.
arXiv Detail & Related papers (2024-10-09T21:06:14Z) - Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy [3.1186464715409983]
We introduce a novel fine-tuning strategy for the Depth Anything Model.
We integrate it with an intrinsic-based unsupervised monocular depth estimation framework.
Our results on the SCARED dataset show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-09-12T03:04:43Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction [36.46068581419659]
Real-time 3D reconstruction of surgical scenes plays a vital role in computer-assisted surgery.
Recent advancements in 3D Gaussian Splatting have shown great potential for real-time novel view synthesis.
We propose the first SfM-free 3DGS-based method for surgical scene reconstruction.
arXiv Detail & Related papers (2024-07-03T08:49:35Z) - Endo-4DGS: Endoscopic Monocular Scene Reconstruction with 4D Gaussian Splatting [12.333523732756163]
Dynamic scene reconstruction can significantly enhance downstream tasks and improve surgical outcomes.
NeRF-based methods have recently risen to prominence for their exceptional ability to reconstruct scenes.
We present Endo-4DGS, a real-time endoscopic dynamic reconstruction approach.
arXiv Detail & Related papers (2024-01-29T18:55:29Z) - Surgical-DINO: Adapter Learning of Foundation Models for Depth
Estimation in Endoscopic Surgery [12.92291406687467]
We design a foundation model-based depth estimation method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for depth estimation in endoscopic surgery.
We build LoRA layers and integrate them into DINO to adapt with surgery-specific domain knowledge instead of conventional fine-tuning.
Our model is extensively validated on a MICCAI challenge dataset of SCARED, which is collected from da Vinci Xi endoscope surgery.
arXiv Detail & Related papers (2024-01-11T16:22:42Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Geometry Uncertainty Projection Network for Monocular 3D Object
Detection [138.24798140338095]
We propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages.
Specifically, a GUP module is proposed to obtains the geometry-guided uncertainty of the inferred depth.
At the training stage, we propose a Hierarchical Task Learning strategy to reduce the instability caused by error amplification.
arXiv Detail & Related papers (2021-07-29T06:59:07Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z) - 3D Dense Geometry-Guided Facial Expression Synthesis by Adversarial
Learning [54.24887282693925]
We propose a novel framework to exploit 3D dense (depth and surface normals) information for expression manipulation.
We use an off-the-shelf state-of-the-art 3D reconstruction model to estimate the depth and create a large-scale RGB-Depth dataset.
Our experiments demonstrate that the proposed method outperforms the competitive baseline and existing arts by a large margin.
arXiv Detail & Related papers (2020-09-30T17:12:35Z) - Simple and Effective Prevention of Mode Collapse in Deep One-Class
Classification [93.2334223970488]
We propose two regularizers to prevent hypersphere collapse in deep SVDD.
The first regularizer is based on injecting random noise via the standard cross-entropy loss.
The second regularizer penalizes the minibatch variance when it becomes too small.
arXiv Detail & Related papers (2020-01-24T03:44:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.