Surgical-DINO: Adapter Learning of Foundation Models for Depth
Estimation in Endoscopic Surgery
- URL: http://arxiv.org/abs/2401.06013v2
- Date: Fri, 12 Jan 2024 11:46:25 GMT
- Title: Surgical-DINO: Adapter Learning of Foundation Models for Depth
Estimation in Endoscopic Surgery
- Authors: Beilei Cui, Mobarakol Islam, Long Bai, Hongliang Ren
- Abstract summary: We design a foundation model-based depth estimation method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for depth estimation in endoscopic surgery.
We build LoRA layers and integrate them into DINO to adapt with surgery-specific domain knowledge instead of conventional fine-tuning.
Our model is extensively validated on a MICCAI challenge dataset of SCARED, which is collected from da Vinci Xi endoscope surgery.
- Score: 12.92291406687467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction,
surgical navigation and augmented reality visualization. Although the
foundation model exhibits outstanding performance in many vision tasks,
including depth estimation (e.g., DINOv2), recent works observed its
limitations in medical and surgical domain-specific applications. This work
presents a low-ranked adaptation (LoRA) of the foundation model for surgical
depth estimation. Methods: We design a foundation model-based depth estimation
method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for
depth estimation in endoscopic surgery. We build LoRA layers and integrate them
into DINO to adapt with surgery-specific domain knowledge instead of
conventional fine-tuning. During training, we freeze the DINO image encoder,
which shows excellent visual representation capacity, and only optimize the
LoRA layers and depth decoder to integrate features from the surgical scene.
Results: Our model is extensively validated on a MICCAI challenge dataset of
SCARED, which is collected from da Vinci Xi endoscope surgery. We empirically
show that Surgical-DINO significantly outperforms all the state-of-the-art
models in endoscopic depth estimation tasks. The analysis with ablation studies
has shown evidence of the remarkable effect of our LoRA layers and adaptation.
Conclusion: Surgical-DINO shed some light on the successful adaptation of the
foundation models into the surgical domain for depth estimation. There is clear
evidence in the results that zero-shot prediction on pre-trained weights in
computer vision datasets or naive fine-tuning is not sufficient to use the
foundation model in the surgical domain directly. Code is available at
https://github.com/BeileiCui/SurgicalDINO.
Related papers
- Surgical Depth Anything: Depth Estimation for Surgical Scenes using Foundation Models [4.740415113160021]
Current state-of-the-art foundational model for depth estimation, Depth Anything, struggles with issues such as blurring, bleeding, and reflections.
This paper presents a fine-tuning of the Depth Anything model specifically for the surgical domain, aiming to deliver more accurate pixel-wise depth maps.
arXiv Detail & Related papers (2024-10-09T21:06:14Z) - Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy [3.1186464715409983]
We introduce a novel fine-tuning strategy for the Depth Anything Model.
We integrate it with an intrinsic-based unsupervised monocular depth estimation framework.
Our results on the SCARED dataset show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-09-12T03:04:43Z) - DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model [17.41557655783514]
We introduce Depth Anything in Robotic Endoscopic Surgery (DARES)
New adaptation technique, Low-Rank Adaptation (LoRA) on the DAM V2 to perform self-supervised monocular depth estimation.
New method is validated superior over recent state-of-the-art self-supervised monocular depth estimation techniques.
arXiv Detail & Related papers (2024-08-30T17:35:06Z) - EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera [12.152362025172915]
We propose Endoscopic Depth Any Camera (EndoDAC) to adapt foundation models to endoscopic scenes.
Specifically, we develop the Dynamic Vector-Based Low-Rank Adaptation (DV-LoRA) and employ Convolutional Neck blocks.
Our framework is capable of being trained solely on monocular surgical videos from any camera, ensuring minimal training costs.
arXiv Detail & Related papers (2024-05-14T14:55:15Z) - Creating a Digital Twin of Spinal Surgery: A Proof of Concept [68.37190859183663]
Surgery digitalization is the process of creating a virtual replica of real-world surgery.
We present a proof of concept (PoC) for surgery digitalization that is applied to an ex-vivo spinal surgery.
We employ five RGB-D cameras for dynamic 3D reconstruction of the surgeon, a high-end camera for 3D reconstruction of the anatomy, an infrared stereo camera for surgical instrument tracking, and a laser scanner for 3D reconstruction of the operating room and data fusion.
arXiv Detail & Related papers (2024-03-25T13:09:40Z) - EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting [53.38166294158047]
EndoGSLAM is an efficient approach for endoscopic surgeries, which integrates streamlined representation and differentiable Gaussianization.
Experiments show that EndoGSLAM achieves a better trade-off between intraoperative availability and reconstruction quality than traditional or neural SLAM approaches.
arXiv Detail & Related papers (2024-03-22T11:27:43Z) - An Endoscopic Chisel: Intraoperative Imaging Carves 3D Anatomical Models [8.516340459721484]
We propose a first vision-based approach to update the preoperative 3D anatomical model.
Results show a decrease in error during surgical progression as opposed to increasing when no update is employed.
arXiv Detail & Related papers (2024-02-19T05:06:52Z) - Neural LerPlane Representations for Fast 4D Reconstruction of Deformable
Tissues [52.886545681833596]
LerPlane is a novel method for fast and accurate reconstruction of surgical scenes under a single-viewpoint setting.
LerPlane treats surgical procedures as 4D volumes and factorizes them into explicit 2D planes of static and dynamic fields.
LerPlane shares static fields, significantly reducing the workload of dynamic tissue modeling.
arXiv Detail & Related papers (2023-05-31T14:38:35Z) - Safe Deep RL for Intraoperative Planning of Pedicle Screw Placement [61.28459114068828]
We propose an intraoperative planning approach for robotic spine surgery that leverages real-time observation for drill path planning based on Safe Deep Reinforcement Learning (DRL)
Our approach was capable of achieving 90% bone penetration with respect to the gold standard (GS) drill planning.
arXiv Detail & Related papers (2023-05-09T11:42:53Z) - Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose
Estimation of Surgical Instruments [66.74633676595889]
We present a multi-camera capture setup consisting of static and head-mounted cameras.
Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre.
Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments.
arXiv Detail & Related papers (2023-05-05T13:42:19Z) - Live image-based neurosurgical guidance and roadmap generation using
unsupervised embedding [53.992124594124896]
We present a method for live image-only guidance leveraging a large data set of annotated neurosurgical videos.
A generated roadmap encodes the common anatomical paths taken in surgeries in the training set.
We trained and evaluated the proposed method with a data set of 166 transsphenoidal adenomectomy procedures.
arXiv Detail & Related papers (2023-03-31T12:52:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.