EndoSLAM Dataset and An Unsupervised Monocular Visual Odometry and Depth
Estimation Approach for Endoscopic Videos: Endo-SfMLearner
- URL: http://arxiv.org/abs/2006.16670v3
- Date: Thu, 1 Oct 2020 13:44:32 GMT
- Title: EndoSLAM Dataset and An Unsupervised Monocular Visual Odometry and Depth
Estimation Approach for Endoscopic Videos: Endo-SfMLearner
- Authors: Kutsev Bengisu Ozyoruk, Guliz Irem Gokceler, Gulfize Coskun, Kagan
Incetan, Yasin Almalioglu, Faisal Mahmood, Eva Curto, Luis Perdigoto, Marina
Oliveira, Hasan Sahin, Helder Araujo, Henrique Alexandrino, Nicholas J. Durr,
Hunter B. Gilbert, and Mehmet Turan
- Abstract summary: We introduce a comprehensive endoscopic SLAM dataset consisting of 3D point cloud data for six porcine organs.
A synthetic capsule endoscopy frame with both depth and pose annotations is included to facilitate the study of simulation-to-real transfer learning algorithms.
We propound Endo-SfMLearner, an unsupervised monocular depth and pose estimation method.
- Score: 10.341552258136572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning techniques hold promise to develop dense topography
reconstruction and pose estimation methods for endoscopic videos. However,
currently available datasets do not support effective quantitative
benchmarking. In this paper, we introduce a comprehensive endoscopic SLAM
dataset consisting of 3D point cloud data for six porcine organs, capsule and
standard endoscopy recordings as well as synthetically generated data. A Panda
robotic arm, two commercially available capsule endoscopes, two conventional
endoscopes with different camera properties, and two high precision 3D scanners
were employed to collect data from 8 ex-vivo porcine gastrointestinal
(GI)-tract organs. In total, 35 sub-datasets are provided with 6D pose ground
truth for the ex-vivo part: 18 sub-dataset for colon, 12 sub-datasets for
stomach and 5 sub-datasets for small intestine, while four of these contain
polyp-mimicking elevations carried out by an expert gastroenterologist.
Synthetic capsule endoscopy frames from GI-tract with both depth and pose
annotations are included to facilitate the study of simulation-to-real transfer
learning algorithms. Additionally, we propound Endo-SfMLearner, an unsupervised
monocular depth and pose estimation method that combines residual networks with
spatial attention module in order to dictate the network to focus on
distinguishable and highly textured tissue regions. The proposed approach makes
use of a brightness-aware photometric loss to improve the robustness under fast
frame-to-frame illumination changes. To exemplify the use-case of the EndoSLAM
dataset, the performance of Endo-SfMLearner is extensively compared with the
state-of-the-art. The codes and the link for the dataset are publicly available
at https://github.com/CapsuleEndoscope/EndoSLAM. A video demonstrating the
experimental setup and procedure is accessible through
https://www.youtube.com/watch?v=G_LCe0aWWdQ.
Related papers
- SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms [60.35639972035727]
The lack of publicly available annotated datasets has impeded the development of robust, machine learning-driven segmentation algorithms.
The SMILE-UHURA challenge addresses the gap in publicly available annotated datasets by providing an annotated dataset of Time-of-Flight angiography acquired with 7T MRI.
Dice scores reached up to 0.838 $pm$ 0.066 and 0.716 $pm$ 0.125 on the respective datasets, with an average performance of up to 0.804 $pm$ 0.15.
arXiv Detail & Related papers (2024-11-14T17:06:00Z) - Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos [12.497782583094281]
Monocular depth estimation in endoscopy videos can enable assistive and robotic surgery to obtain better coverage of the organ and detection of various health issues.
Despite promising progress on mainstream, natural image depth estimation, techniques perform poorly on endoscopy images.
In this paper, we utilize the photometric cues, i.e., the light emitted from an endoscope and reflected by the surface, to improve monocular depth estimation.
arXiv Detail & Related papers (2024-03-26T17:52:23Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - Generative Enhancement for 3D Medical Images [74.17066529847546]
We propose GEM-3D, a novel generative approach to the synthesis of 3D medical images.
Our method begins with a 2D slice, noted as the informed slice to serve the patient prior, and propagates the generation process using a 3D segmentation mask.
By decomposing the 3D medical images into masks and patient prior information, GEM-3D offers a flexible yet effective solution for generating versatile 3D images.
arXiv Detail & Related papers (2024-03-19T15:57:04Z) - AiAReSeg: Catheter Detection and Segmentation in Interventional
Ultrasound using Transformers [75.20925220246689]
endovascular surgeries are performed using the golden standard of Fluoroscopy, which uses ionising radiation to visualise catheters and vasculature.
This work proposes a solution using an adaptation of a state-of-the-art machine learning transformer architecture to detect and segment catheters in axial interventional Ultrasound image sequences.
arXiv Detail & Related papers (2023-09-25T19:34:12Z) - RVD: A Handheld Device-Based Fundus Video Dataset for Retinal Vessel
Segmentation [42.145795119000056]
We introduce the first video-based retinal dataset by employing handheld devices for data acquisition.
The dataset comprises 635 smartphone-based fundus videos collected from four different clinics, involving 415 patients from 50 to 75 years old.
arXiv Detail & Related papers (2023-07-13T06:30:09Z) - A geometry-aware deep network for depth estimation in monocular
endoscopy [17.425158094539462]
The proposed method is extensively validated across different datasets and clinical images.
The generalizability of the proposed method achieves mean RMSE values of 12.604 (T1-L1), 9.930 (T2-L2), and 13.893 (colon) on the ColonDepth dataset.
arXiv Detail & Related papers (2023-04-20T11:59:32Z) - OADAT: Experimental and Synthetic Clinical Optoacoustic Data for
Standardized Image Processing [62.993663757843464]
Optoacoustic (OA) imaging is based on excitation of biological tissues with nanosecond-duration laser pulses followed by detection of ultrasound waves generated via light-absorption-mediated thermoelastic expansion.
OA imaging features a powerful combination between rich optical contrast and high resolution in deep tissues.
No standardized datasets generated with different types of experimental set-up and associated processing methods are available to facilitate advances in broader applications of OA in clinical settings.
arXiv Detail & Related papers (2022-06-17T08:11:26Z) - EndoMapper dataset of complete calibrated endoscopy procedures [8.577980383972005]
This paper introduces the Endomapper dataset, the first collection of complete endoscopy sequences acquired during regular medical practice.
Data will be used to build a 3D mapping and localization systems that can perform special task like, for example, detect blind zones during exploration.
arXiv Detail & Related papers (2022-04-29T17:10:01Z) - SERV-CT: A disparity dataset from CT for validation of endoscopic 3D
reconstruction [8.448866668577946]
We present a stereo-endoscopic reconstruction validation dataset based on CT (SERV-CT)
The SERV-CT dataset provides an easy to use stereoscopic validation for surgical applications with smooth reference disparities and depths with coverage over the majority of the endoscopic images.
arXiv Detail & Related papers (2020-12-22T01:28:30Z) - Fed-Sim: Federated Simulation for Medical Imaging [131.56325440976207]
We introduce a physics-driven generative approach that consists of two learnable neural modules.
We show that our data synthesis framework improves the downstream segmentation performance on several datasets.
arXiv Detail & Related papers (2020-09-01T19:17:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.