Automatic Detection of Out-of-body Frames in Surgical Videos for Privacy
Protection Using Self-supervised Learning and Minimal Labels
- URL: http://arxiv.org/abs/2303.18106v1
- Date: Fri, 31 Mar 2023 14:53:56 GMT
- Title: Automatic Detection of Out-of-body Frames in Surgical Videos for Privacy
Protection Using Self-supervised Learning and Minimal Labels
- Authors: Ziheng Wang, Conor Perreault, Xi Liu, Anthony Jarc
- Abstract summary: We propose a framework that accurately detects out-of-body frames in surgical videos.
We use a massive amount of unlabeled endoscopic images to learn meaningful representations in a self-supervised manner.
- Score: 4.356941104145803
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Endoscopic video recordings are widely used in minimally invasive
robot-assisted surgery, but when the endoscope is outside the patient's body,
it can capture irrelevant segments that may contain sensitive information. To
address this, we propose a framework that accurately detects out-of-body frames
in surgical videos by leveraging self-supervision with minimal data labels. We
use a massive amount of unlabeled endoscopic images to learn meaningful
representations in a self-supervised manner. Our approach, which involves
pre-training on an auxiliary task and fine-tuning with limited supervision,
outperforms previous methods for detecting out-of-body frames in surgical
videos captured from da Vinci X and Xi surgical systems. The average F1 scores
range from 96.00 to 98.02. Remarkably, using only 5% of the training labels,
our approach still maintains an average F1 score performance above 97,
outperforming fully-supervised methods with 95% fewer labels. These results
demonstrate the potential of our framework to facilitate the safe handling of
surgical video recordings and enhance data privacy protection in minimally
invasive surgery.
Related papers
- AMNCutter: Affinity-Attention-Guided Multi-View Normalized Cutter for Unsupervised Surgical Instrument Segmentation [7.594796294925481]
We propose a label-free unsupervised model featuring a novel module named Multi-View Normalized Cutter (m-NCutter)
Our model is trained using a graph-cutting loss function that leverages patch affinities for supervision, eliminating the need for pseudo-labels.
We conduct comprehensive experiments across multiple SIS datasets to validate our approach's state-of-the-art (SOTA) performance, robustness, and exceptional potential as a pre-trained model.
arXiv Detail & Related papers (2024-11-06T06:33:55Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - Weakly-Supervised Surgical Phase Recognition [19.27227976291303]
In this work we join concepts of graph segmentation with self-supervised learning to derive a random-walk solution for per-frame phase prediction.
We validate our method by running experiments with the public Cholec80 dataset of laparoscopic cholecystectomy videos.
arXiv Detail & Related papers (2023-10-26T07:54:47Z) - Video object detection for privacy-preserving patient monitoring in
intensive care [0.0]
We propose a new method for exploiting information in the temporal succession of video frames.
Our method outperforms a standard YOLOv5 baseline model by +1.7% mAP@.5 while also training over ten times faster on our proprietary dataset.
arXiv Detail & Related papers (2023-06-26T11:52:22Z) - Surgical tool classification and localization: results and methods from
the MICCAI 2022 SurgToolLoc challenge [69.91670788430162]
We present the results of the SurgLoc 2022 challenge.
The goal was to leverage tool presence data as weak labels for machine learning models trained to detect tools.
We conclude by discussing these results in the broader context of machine learning and surgical data science.
arXiv Detail & Related papers (2023-05-11T21:44:39Z) - Self-supervised contrastive learning of echocardiogram videos enables
label-efficient cardiac disease diagnosis [48.64462717254158]
We developed a self-supervised contrastive learning approach, EchoCLR, to catered to echocardiogram videos.
When fine-tuned on small portions of labeled data, EchoCLR pretraining significantly improved classification performance for left ventricular hypertrophy (LVH) and aortic stenosis (AS)
EchoCLR is unique in its ability to learn representations of medical videos and demonstrates that SSL can enable label-efficient disease classification from small, labeled datasets.
arXiv Detail & Related papers (2022-07-23T19:17:26Z) - Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical
Scene Segmentation with Limited Annotations [72.15956198507281]
We propose PGV-CL, a novel pseudo-label guided cross-video contrast learning method to boost scene segmentation.
We extensively evaluate our method on a public robotic surgery dataset EndoVis18 and a public cataract dataset CaDIS.
arXiv Detail & Related papers (2022-07-20T05:42:19Z) - Self-Supervised Learning from Unlabeled Fundus Photographs Improves
Segmentation of the Retina [4.815051667870375]
Fundus photography is the primary method for retinal imaging and essential for diabetic retinopathy prevention.
Current segmentation methods are not robust towards the diversity in imaging conditions and pathologies typical for real-world clinical applications.
We utilize contrastive self-supervised learning to exploit the large variety of unlabeled fundus images in the publicly available EyePACS dataset.
arXiv Detail & Related papers (2021-08-05T18:02:56Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - LRTD: Long-Range Temporal Dependency based Active Learning for Surgical
Workflow Recognition [67.86810761677403]
We propose a novel active learning method for cost-effective surgical video analysis.
Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency.
We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task.
arXiv Detail & Related papers (2020-04-21T09:21:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.