Related papers: Jumpstarting Surgical Computer Vision

Jumpstarting Surgical Computer Vision

URL: http://arxiv.org/abs/2312.05968v1
Date: Sun, 10 Dec 2023 18:54:16 GMT
Title: Jumpstarting Surgical Computer Vision
Authors: Deepak Alapatt, Aditya Murali, Vinkle Srivastav, Pietro Mascagni, AI4SafeChole Consortium, Nicolas Padoy
Abstract summary: We employ self-supervised learning to flexibly leverage diverse surgical datasets. We study phase recognition and critical view of safety in laparoscopic cholecystectomy and laparoscopic hysterectomy. The composition of pre-training datasets can severely affect the effectiveness of SSL methods for various downstream tasks.
Score: 2.7396997668655163
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Purpose: General consensus amongst researchers and industry points to a lack of large, representative annotated datasets as the biggest obstacle to progress in the field of surgical data science. Self-supervised learning represents a solution to part of this problem, removing the reliance on annotations. However, the robustness of current self-supervised learning methods to domain shifts remains unclear, limiting our understanding of its utility for leveraging diverse sources of surgical data. Methods: In this work, we employ self-supervised learning to flexibly leverage diverse surgical datasets, thereby learning taskagnostic representations that can be used for various surgical downstream tasks. Based on this approach, to elucidate the impact of pre-training on downstream task performance, we explore 22 different pre-training dataset combinations by modulating three variables: source hospital, type of surgical procedure, and pre-training scale (number of videos). We then finetune the resulting model initializations on three diverse downstream tasks: namely, phase recognition and critical view of safety in laparoscopic cholecystectomy and phase recognition in laparoscopic hysterectomy. Results: Controlled experimentation highlights sizable boosts in performance across various tasks, datasets, and labeling budgets. However, this performance is intricately linked to the composition of the pre-training dataset, robustly proven through several study stages. Conclusion: The composition of pre-training datasets can severely affect the effectiveness of SSL methods for various downstream tasks and should critically inform future data collection efforts to scale the application of SSL methodologies. Keywords: Self-Supervised Learning, Transfer Learning, Surgical Computer Vision, Endoscopic Videos, Critical View of Safety, Phase Recognition

Related papers

Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections [35.585690280385826]
We adapt the Masked Autoencoder for federated learning, enhancing Sharpness-Aware Minimization (FedSAM) and Weight Averaging. Our findings demonstrate that integrating FedSAM into the federated MAE approach improves pretraining, leading to a reduction in reconstruction loss per patch. These findings highlight the potential of federated learning for privacy-preserving training of surgical foundation models.
arXiv Detail & Related papers (2025-04-23T10:54:32Z)
Boosting Few-Shot Learning with Disentangled Self-Supervised Learning and Meta-Learning for Medical Image Classification [8.975676404678374]
We present a strategy for improving the performance and generalization capabilities of models trained in low-data regimes. The proposed method starts with a pre-training phase, where features learned in a self-supervised learning setting are disentangled to improve the robustness of the representations for downstream tasks. We then introduce a meta-fine-tuning step, leveraging related classes between meta-training and meta-testing phases but varying the level.
arXiv Detail & Related papers (2024-03-26T09:36:20Z)
LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery [57.358568111574314]
Patient data privacy often restricts the availability of old data when updating the model. Prior CL studies overlooked two vital problems in the surgical domain. This paper proposes addressing these problems with a multimodal large language model (LLM) and an adaptive weight assignment methodology.
arXiv Detail & Related papers (2024-02-26T15:35:24Z)
Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community. The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z)
Rethinking Surgical Instrument Segmentation: A Background Image Can Be All You Need [18.830738606514736]
Data scarcity and imbalance have heavily affected the model accuracy and limited the design and deployment of deep learning-based surgical applications. We propose a one-to-many data generation solution that gets rid of the complicated and expensive process of data collection and annotation from robotic surgery. Our empirical analysis suggests that without the high cost of data collection and annotation, we can achieve decent surgical instrument segmentation performance.
arXiv Detail & Related papers (2022-06-23T16:22:56Z)
LifeLonger: A Benchmark for Continual Disease Classification [59.13735398630546]
We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection. Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch. Cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.
arXiv Detail & Related papers (2022-04-12T12:25:05Z)
CholecTriplet2021: A benchmark challenge for surgical action triplet recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z)
Federated Cycling (FedCy): Semi-supervised Federated Learning of Surgical Phases [57.90226879210227]
FedCy is a semi-supervised learning (FSSL) method that combines FL and self-supervised learning to exploit a decentralized dataset of both labeled and unlabeled videos. We demonstrate significant performance gains over state-of-the-art FSSL methods on the task of automatic recognition of surgical phases.
arXiv Detail & Related papers (2022-03-14T17:44:53Z)
Simulation-to-Real domain adaptation with teacher-student learning for endoscopic instrument segmentation [1.1047993346634768]
We introduce a teacher-student learning approach that learns jointly from annotated simulation data and unlabeled real data. Empirical results on three datasets highlight the effectiveness of the proposed framework.
arXiv Detail & Related papers (2021-03-02T09:30:28Z)
Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions. Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures. The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.