Surgical Tattoos in Infrared: A Dataset for Quantifying Tissue Tracking
and Mapping
- URL: http://arxiv.org/abs/2309.16782v2
- Date: Thu, 29 Feb 2024 18:57:17 GMT
- Title: Surgical Tattoos in Infrared: A Dataset for Quantifying Tissue Tracking
and Mapping
- Authors: Adam Schmidt, Omid Mohareri, Simon DiMaio, Septimiu E. Salcudean
- Abstract summary: Surgical Tattoos in Infrared dataset comprises hundreds of stereo video clips in both in-vivo and ex-vivo scenes.
With over 3,000 labelled points, STIR will help to quantify and enable better analysis of tracking and mapping methods.
- Score: 7.282909831316735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantifying performance of methods for tracking and mapping tissue in
endoscopic environments is essential for enabling image guidance and automation
of medical interventions and surgery. Datasets developed so far either use
rigid environments, visible markers, or require annotators to label salient
points in videos after collection. These are respectively: not general, visible
to algorithms, or costly and error-prone. We introduce a novel labeling
methodology along with a dataset that uses said methodology, Surgical Tattoos
in Infrared (STIR). STIR has labels that are persistent but invisible to
visible spectrum algorithms. This is done by labelling tissue points with
IR-fluorescent dye, indocyanine green (ICG), and then collecting visible light
video clips. STIR comprises hundreds of stereo video clips in both in-vivo and
ex-vivo scenes with start and end points labelled in the IR spectrum. With over
3,000 labelled points, STIR will help to quantify and enable better analysis of
tracking and mapping methods. After introducing STIR, we analyze multiple
different frame-based tracking methods on STIR using both 3D and 2D endpoint
error and accuracy metrics. STIR is available at
https://dx.doi.org/10.21227/w8g4-g548
Related papers
- SurgPose: a Dataset for Articulated Robotic Surgical Tool Pose Estimation and Tracking [42.05426874677755]
SurgPose is an instance-aware semantic keypoints and skeletons for visual surgical tool pose estimation and tracking.
The SurgPose dataset consists of approximately 120k surgical instrument instances (80k for training and 40k for validation) of 6 categories.
Since the videos are collected in stereo pairs, the 2D pose can be lifted to 3D based on stereo-matching depth.
arXiv Detail & Related papers (2025-02-17T08:04:53Z) - EchoTracker: Advancing Myocardial Point Tracking in Echocardiography [0.6263680699548959]
EchoTracker is a two-fold coarse-to-fine model that facilitates the tracking of queried points on a tissue surface across ultrasound image sequences.
Experiments demonstrate that the model outperforms SOTA methods, with an average position accuracy of 67% and a median trajectory error of 2.86 pixels.
This implies that learning-based point tracking can potentially improve performance and yield a higher diagnostic and prognostic value for clinical measurements.
arXiv Detail & Related papers (2024-05-14T13:24:51Z) - LABELMAKER: Automatic Semantic Label Generation from RGB-D Trajectories [59.14011485494713]
This work introduces a fully automated 2D/3D labeling framework that can generate labels for RGB-D scans at equal (or better) level of accuracy.
We demonstrate the effectiveness of our LabelMaker pipeline by generating significantly better labels for the ScanNet datasets and automatically labelling the previously unlabeled ARKitScenes dataset.
arXiv Detail & Related papers (2023-11-20T20:40:24Z) - Data-Efficient Vision Transformers for Multi-Label Disease
Classification on Chest Radiographs [55.78588835407174]
Vision Transformers (ViTs) have not been applied to this task despite their high classification performance on generic images.
ViTs do not rely on convolutions but on patch-based self-attention and in contrast to CNNs, no prior knowledge of local connectivity is present.
Our results show that while the performance between ViTs and CNNs is on par with a small benefit for ViTs, DeiTs outperform the former if a reasonably large data set is available for training.
arXiv Detail & Related papers (2022-08-17T09:07:45Z) - Robust Landmark-based Stent Tracking in X-ray Fluoroscopy [10.917460255497227]
We propose an end-to-end deep learning framework for single stent tracking.
It consists of three hierarchical modules: U-Net based landmark detection, ResNet based stent proposal and feature extraction.
Experiments show that our method performs significantly better in detection compared with the state-of-the-art point-based tracking models.
arXiv Detail & Related papers (2022-07-20T14:20:03Z) - Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical
Scene Segmentation with Limited Annotations [72.15956198507281]
We propose PGV-CL, a novel pseudo-label guided cross-video contrast learning method to boost scene segmentation.
We extensively evaluate our method on a public robotic surgery dataset EndoVis18 and a public cataract dataset CaDIS.
arXiv Detail & Related papers (2022-07-20T05:42:19Z) - LapSeg3D: Weakly Supervised Semantic Segmentation of Point Clouds
Representing Laparoscopic Scenes [1.7941882788670036]
We propose LapSeg3D, a novel approach for the voxel-wise annotation of point clouds representing surgical scenes.
As the manual annotation of training data is highly time consuming, we introduce a semi-autonomous clustering-based pipeline for the annotation of the gallbladder.
We show LapSeg3D to generalize accurately across different gallbladders and datasets recorded with different RGB-D camera systems.
arXiv Detail & Related papers (2022-07-15T11:57:14Z) - Comparison of Representation Learning Techniques for Tracking in time
resolved 3D Ultrasound [0.7734726150561088]
3D ultrasound (3DUS) becomes more interesting for target tracking in radiation therapy due to its capability to provide volumetric images in real-time without using ionizing radiation.
For this, a method for learning meaningful representations would be useful to recognize anatomical structures in different time frames in representation space (r-space)
In this study, 3DUS patches are reduced into a 128-dimensional r-space using conventional autoencoder, variational autoencoder and sliced-wasserstein autoencoder.
arXiv Detail & Related papers (2022-01-10T12:38:22Z) - Voice-assisted Image Labelling for Endoscopic Ultrasound Classification
using Neural Networks [48.732863591145964]
We propose a multi-modal convolutional neural network architecture that labels endoscopic ultrasound (EUS) images from raw verbal comments provided by a clinician during the procedure.
Our results show a prediction accuracy of 76% at image level on a dataset with 5 different labels.
arXiv Detail & Related papers (2021-10-12T21:22:24Z) - SOMA: Solving Optical Marker-Based MoCap Automatically [56.59083192247637]
We train a novel neural network called SOMA, which takes raw mocap point clouds with varying numbers of points and labels them at scale.
Soma exploits an architecture with stacked self-attention elements to learn the spatial structure of the 3D body.
We automatically label over 8 hours of archival mocap data across 4 different datasets.
arXiv Detail & Related papers (2021-10-09T02:27:27Z) - Supervision by Registration and Triangulation for Landmark Detection [70.13440728689231]
We present Supervision by Registration and Triangulation (SRT), an unsupervised approach that utilizes unlabeled multi-view video to improve the accuracy and precision of landmark detectors.
Being able to utilize unlabeled data enables our detectors to learn from massive amounts of unlabeled data freely available.
arXiv Detail & Related papers (2021-01-25T02:48:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.