Related papers: SAGE: SLAM with Appearance and Geometry Prior for Endoscopy

SAGE: SLAM with Appearance and Geometry Prior for Endoscopy

URL: http://arxiv.org/abs/2202.09487v2
Date: Tue, 22 Feb 2022 18:24:03 GMT
Title: SAGE: SLAM with Appearance and Geometry Prior for Endoscopy
Authors: Xingtong Liu, Zhaoshuo Li, Masaru Ishii, Gregory D. Hager, Russell H. Taylor, Mathias Unberath
Abstract summary: In endoscopy, many applications would benefit from a real-time method that can simultaneously track the endoscope and reconstruct the dense 3D geometry of the observed anatomy from a monocular endoscopic video. We develop a Simultaneous Localization and Mapping system by combining the learning-based appearance and optimizable geometry priors and factor graph optimization. The proposed SLAM system is shown to robustly handle the challenges of texture scarceness and illumination variation that are commonly seen in endoscopy.
Score: 24.94746710994156
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In endoscopy, many applications (e.g., surgical navigation) would benefit from a real-time method that can simultaneously track the endoscope and reconstruct the dense 3D geometry of the observed anatomy from a monocular endoscopic video. To this end, we develop a Simultaneous Localization and Mapping system by combining the learning-based appearance and optimizable geometry priors and factor graph optimization. The appearance and geometry priors are explicitly learned in an end-to-end differentiable training pipeline to master the task of pair-wise image alignment, one of the core components of the SLAM system. In our experiments, the proposed SLAM system is shown to robustly handle the challenges of texture scarceness and illumination variation that are commonly seen in endoscopy. The system generalizes well to unseen endoscopes and subjects and performs favorably compared with a state-of-the-art feature-based SLAM system. The code repository is available at https://github.com/lppllppl920/SAGE-SLAM.git.

Related papers

Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques [91.26187560114381]
Feature matching is a cornerstone task in computer vision, essential for applications such as image retrieval, stereo matching, 3D reconstruction, and SLAM.<n>This survey comprehensively reviews modality-based feature matching, exploring traditional handcrafted methods and contemporary deep learning approaches.
arXiv Detail & Related papers (2025-07-30T15:56:36Z)
Geo-RepNet: Geometry-Aware Representation Learning for Surgical Phase Recognition in Endoscopic Submucosal Dissection [10.386536115270294]
Geo-RepNet is a geometry-aware convolutional framework that integrates RGB image and depth information to enhance recognition performance in complex surgical scenes.<n>To evaluate the effectiveness of our approach, we construct a nine-phase ESD dataset with dense frame-level annotations from real-world ESD videos.
arXiv Detail & Related papers (2025-07-12T14:07:44Z)
Generalizable and Relightable Gaussian Splatting for Human Novel View Synthesis [49.67420486373202]
GRGS is a generalizable and relightable 3D Gaussian framework for high-fidelity human novel view synthesis under diverse lighting conditions.<n>We introduce a Lighting-aware Geometry Refinement (LGR) module trained on synthetically relit data to predict accurate depth and surface normals.
arXiv Detail & Related papers (2025-05-27T17:59:47Z)
Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping [12.027762278121052]
Endo-2DTAM is a real-time endoscopic SLAM system with 2D Gaussian Splatting (2DGS) Our robust tracking module combines point-to-point and point-to-plane distance metrics. Our mapping module utilizes normal consistency and depth distortion to enhance surface reconstruction quality.
arXiv Detail & Related papers (2025-01-31T17:15:34Z)
GUS-IR: Gaussian Splatting with Unified Shading for Inverse Rendering [83.69136534797686]
We present GUS-IR, a novel framework designed to address the inverse rendering problem for complicated scenes featuring rough and glossy surfaces. This paper starts by analyzing and comparing two prominent shading techniques popularly used for inverse rendering, forward shading and deferred shading. We propose a unified shading solution that combines the advantages of both techniques for better decomposition.
arXiv Detail & Related papers (2024-11-12T01:51:05Z)
BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications [0.0]
This study presents BodySLAM, a robust deep learning-based MVSLAM approach that addresses these challenges through three key components. CycleVO is a novel unsupervised monocular pose estimation module; the integration of the state-of-the-art Zoe architecture for monocular depth estimation; and a 3D reconstruction module creating a coherent surgical map. Results demonstrate that CycleVO exhibited competitive performance with the lowest inference time among pose estimation methods, while maintaining robust generalization capabilities.
arXiv Detail & Related papers (2024-08-06T10:13:57Z)
MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization [29.713650915551632]
This letter introduces a novel framework for dense Visual Simultaneous Localization and Mapping based on Gaussian Splatting. We jointly optimize sparse visual odometry tracking and 3D Gaussian Splatting scene representation for the first time. The accuracy of our pose estimation surpasses existing methods and state-of-the-art.
arXiv Detail & Related papers (2024-05-10T04:42:21Z)
EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting [53.38166294158047]
EndoGSLAM is an efficient approach for endoscopic surgeries, which integrates streamlined representation and differentiable Gaussianization. Experiments show that EndoGSLAM achieves a better trade-off between intraoperative availability and reconstruction quality than traditional or neural SLAM approaches.
arXiv Detail & Related papers (2024-03-22T11:27:43Z)
DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation. Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details. Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z)
A geometry-aware deep network for depth estimation in monocular endoscopy [17.425158094539462]
The proposed method is extensively validated across different datasets and clinical images. The generalizability of the proposed method achieves mean RMSE values of 12.604 (T1-L1), 9.930 (T2-L2), and 13.893 (colon) on the ColonDepth dataset.
arXiv Detail & Related papers (2023-04-20T11:59:32Z)
Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views. We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z)
Single-shot Hyperspectral-Depth Imaging with Learned Diffractive Optics [72.9038524082252]
We propose a compact single-shot monocular hyperspectral-depth (HS-D) imaging method. Our method uses a diffractive optical element (DOE), the point spread function of which changes with respect to both depth and spectrum. To facilitate learning the DOE, we present a first HS-D dataset by building a benchtop HS-D imager.
arXiv Detail & Related papers (2020-09-01T14:19:35Z)
Pathological Retinal Region Segmentation From OCT Images Using Geometric Relation Based Augmentation [84.7571086566595]
We propose improvements over previous GAN-based medical image synthesis methods by jointly encoding the intrinsic relationship of geometry and shape. The proposed method outperforms state-of-the-art segmentation methods on the public RETOUCH dataset having images captured from different acquisition procedures.
arXiv Detail & Related papers (2020-03-31T11:50:43Z)
DeepFactors: Real-Time Probabilistic Dense Monocular SLAM [29.033778410908877]
We present a SLAM system that unifies methods in a probabilistic framework while still maintaining real-time performance. This is achieved through the use of a learned compact depth map representation and reformulating three different types of errors. We evaluate our system on trajectory estimation and depth reconstruction on real-world sequences and present various examples of estimated dense geometry.
arXiv Detail & Related papers (2020-01-14T21:08:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.