Jumpstarting Surgical Computer Vision
- URL: http://arxiv.org/abs/2312.05968v1
- Date: Sun, 10 Dec 2023 18:54:16 GMT
- Title: Jumpstarting Surgical Computer Vision
- Authors: Deepak Alapatt, Aditya Murali, Vinkle Srivastav, Pietro Mascagni,
AI4SafeChole Consortium, Nicolas Padoy
- Abstract summary: We employ self-supervised learning to flexibly leverage diverse surgical datasets.
We study phase recognition and critical view of safety in laparoscopic cholecystectomy and laparoscopic hysterectomy.
The composition of pre-training datasets can severely affect the effectiveness of SSL methods for various downstream tasks.
- Score: 2.7396997668655163
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Purpose: General consensus amongst researchers and industry points to a lack
of large, representative annotated datasets as the biggest obstacle to progress
in the field of surgical data science. Self-supervised learning represents a
solution to part of this problem, removing the reliance on annotations.
However, the robustness of current self-supervised learning methods to domain
shifts remains unclear, limiting our understanding of its utility for
leveraging diverse sources of surgical data. Methods: In this work, we employ
self-supervised learning to flexibly leverage diverse surgical datasets,
thereby learning taskagnostic representations that can be used for various
surgical downstream tasks. Based on this approach, to elucidate the impact of
pre-training on downstream task performance, we explore 22 different
pre-training dataset combinations by modulating three variables: source
hospital, type of surgical procedure, and pre-training scale (number of
videos). We then finetune the resulting model initializations on three diverse
downstream tasks: namely, phase recognition and critical view of safety in
laparoscopic cholecystectomy and phase recognition in laparoscopic
hysterectomy. Results: Controlled experimentation highlights sizable boosts in
performance across various tasks, datasets, and labeling budgets. However, this
performance is intricately linked to the composition of the pre-training
dataset, robustly proven through several study stages. Conclusion: The
composition of pre-training datasets can severely affect the effectiveness of
SSL methods for various downstream tasks and should critically inform future
data collection efforts to scale the application of SSL methodologies.
Keywords: Self-Supervised Learning, Transfer Learning, Surgical Computer
Vision, Endoscopic Videos, Critical View of Safety, Phase Recognition
Related papers
- SemiVT-Surge: Semi-Supervised Video Transformer for Surgical Phase Recognition [2.764986157003598]
We propose a video transformer-based model with a robust pseudo-labeling framework.<n>By incorporating unlabeled data, we achieve state-of-the-art performance on RAMIE with a 4.9% accuracy increase.<n>Our findings establish a strong benchmark for semi-supervised surgical phase recognition.
arXiv Detail & Related papers (2025-06-02T09:32:12Z) - Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections [35.585690280385826]
We adapt the Masked Autoencoder for federated learning, enhancing Sharpness-Aware Minimization (FedSAM) and Weight Averaging.
Our findings demonstrate that integrating FedSAM into the federated MAE approach improves pretraining, leading to a reduction in reconstruction loss per patch.
These findings highlight the potential of federated learning for privacy-preserving training of surgical foundation models.
arXiv Detail & Related papers (2025-04-23T10:54:32Z) - Boosting Few-Shot Learning with Disentangled Self-Supervised Learning and Meta-Learning for Medical Image Classification [8.975676404678374]
We present a strategy for improving the performance and generalization capabilities of models trained in low-data regimes.
The proposed method starts with a pre-training phase, where features learned in a self-supervised learning setting are disentangled to improve the robustness of the representations for downstream tasks.
We then introduce a meta-fine-tuning step, leveraging related classes between meta-training and meta-testing phases but varying the level.
arXiv Detail & Related papers (2024-03-26T09:36:20Z) - LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery [57.358568111574314]
Patient data privacy often restricts the availability of old data when updating the model.
Prior CL studies overlooked two vital problems in the surgical domain.
This paper proposes addressing these problems with a multimodal large language model (LLM) and an adaptive weight assignment methodology.
arXiv Detail & Related papers (2024-02-26T15:35:24Z) - ProtoKD: Learning from Extremely Scarce Data for Parasite Ova
Recognition [5.224806515926022]
We introduce ProtoKD, one of the first approaches to tackle the problem of multi-class parasitic ova recognition using extremely scarce data.
We establish a new benchmark to drive research in this critical direction and validate that the proposed ProtoKD framework achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-09-18T23:49:04Z) - Robust Surgical Tools Detection in Endoscopic Videos with Noisy Data [2.566694420723775]
We propose a systematic methodology for developing robust models for surgical tool detection using noisy data.
Our methodology introduces two key innovations: (1) an intelligent active learning strategy for minimal dataset identification and label correction by human experts; and (2) an assembling strategy for a student-teacher model-based self-training framework.
The proposed methodology achieves an average F1-score of 85.88% for the ensemble model-based self-training with class weights, and 80.88% without class weights for noisy labels.
arXiv Detail & Related papers (2023-07-03T08:12:56Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Unsupervised pre-training of graph transformers on patient population
graphs [48.02011627390706]
We propose a graph-transformer-based network to handle heterogeneous clinical data.
We show the benefit of our pre-training method in a self-supervised and a transfer learning setting.
arXiv Detail & Related papers (2022-07-21T16:59:09Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - Rethinking Surgical Instrument Segmentation: A Background Image Can Be
All You Need [18.830738606514736]
Data scarcity and imbalance have heavily affected the model accuracy and limited the design and deployment of deep learning-based surgical applications.
We propose a one-to-many data generation solution that gets rid of the complicated and expensive process of data collection and annotation from robotic surgery.
Our empirical analysis suggests that without the high cost of data collection and annotation, we can achieve decent surgical instrument segmentation performance.
arXiv Detail & Related papers (2022-06-23T16:22:56Z) - LifeLonger: A Benchmark for Continual Disease Classification [59.13735398630546]
We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection.
Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch.
Cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.
arXiv Detail & Related papers (2022-04-12T12:25:05Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - Federated Cycling (FedCy): Semi-supervised Federated Learning of
Surgical Phases [57.90226879210227]
FedCy is a semi-supervised learning (FSSL) method that combines FL and self-supervised learning to exploit a decentralized dataset of both labeled and unlabeled videos.
We demonstrate significant performance gains over state-of-the-art FSSL methods on the task of automatic recognition of surgical phases.
arXiv Detail & Related papers (2022-03-14T17:44:53Z) - Simulation-to-Real domain adaptation with teacher-student learning for
endoscopic instrument segmentation [1.1047993346634768]
We introduce a teacher-student learning approach that learns jointly from annotated simulation data and unlabeled real data.
Empirical results on three datasets highlight the effectiveness of the proposed framework.
arXiv Detail & Related papers (2021-03-02T09:30:28Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.