Exploring Pre-training Across Domains for Few-Shot Surgical Skill Assessment
- URL: http://arxiv.org/abs/2509.09327v1
- Date: Thu, 11 Sep 2025 10:23:19 GMT
- Title: Exploring Pre-training Across Domains for Few-Shot Surgical Skill Assessment
- Authors: Dimitrios Anastasiou, Razvan Caramalau, Nazir Sirajudeen, Matthew Boal, Philip Edwards, Justin Collins, John Kelly, Ashwin Sridhar, Maxine Tran, Faiz Mumtaz, Nevil Pavithran, Nader Francis, Danail Stoyanov, Evangelos B. Mazomenos,
- Abstract summary: Automated surgical skill assessment (SSA) is a central task in surgical computer vision.<n>Few-shot learning offers a scalable alternative enabling model development with minimal supervision.<n>While widely studied for several surgical downstream tasks, pre-training has remained largely unexplored in SSA.
- Score: 6.796181867667279
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated surgical skill assessment (SSA) is a central task in surgical computer vision. Developing robust SSA models is challenging due to the scarcity of skill annotations, which are time-consuming to produce and require expert consensus. Few-shot learning (FSL) offers a scalable alternative enabling model development with minimal supervision, though its success critically depends on effective pre-training. While widely studied for several surgical downstream tasks, pre-training has remained largely unexplored in SSA. In this work, we formulate SSA as a few-shot task and investigate how self-supervised pre-training strategies affect downstream few-shot SSA performance. We annotate a publicly available robotic surgery dataset with Objective Structured Assessment of Technical Skill (OSATS) scores, and evaluate various pre-training sources across three few-shot settings. We quantify domain similarity and analyze how domain gap and the inclusion of procedure-specific data into pre-training influence transferability. Our results show that small but domain-relevant datasets can outperform large scale, less aligned ones, achieving accuracies of 60.16%, 66.03%, and 73.65% in the 1-, 2-, and 5-shot settings, respectively. Moreover, incorporating procedure-specific data into pre-training with a domain-relevant external dataset significantly boosts downstream performance, with an average gain of +1.22% in accuracy and +2.28% in F1-score; however, applying the same strategy with less similar but large-scale sources can instead lead to performance degradation. Code and models are available at https://github.com/anastadimi/ssa-fsl.
Related papers
- Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training [51.41246396610475]
This paper aims to predict performance in closed-book question answering (QA) without the help of external tools.<n>We conduct large-scale retrieval and semantic analysis across the pre-training corpora of 21 publicly available and 3 custom-trained large language models.<n>Building on these foundations, we propose Size-dependent Mutual Information (SMI), an information-theoretic metric that linearly correlates pre-training data characteristics.
arXiv Detail & Related papers (2025-02-06T13:23:53Z) - Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images [67.66644395272075]
We present first analysis of state-of-the-art semantic segmentation models when faced with geometric out-of-distribution data.
We propose an augmentation technique called "Organ Transplantation" to enhance generalizability.
Our augmentation technique improves SOA model performance by up to 67 % for RGB data and 90 % for HSI data, achieving performance at the level of in-distribution performance on real OOD test data.
arXiv Detail & Related papers (2024-08-27T19:13:15Z) - Efficient Continual Pre-training by Mitigating the Stability Gap [68.49269649759005]
We study the behavior of Large Language Models (LLMs) during continual pre-training.
We propose three effective strategies to enhance LLM performance within a fixed compute budget.
Our strategies improve the average medical task performance of the OpenLlama-3B model from 36.2% to 40.7% with only 40% of the original training budget.
arXiv Detail & Related papers (2024-06-21T02:28:37Z) - Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning [8.348143234047486]
Few-shot recognition aims to train a classification model with only a few labeled examples of each concept concerned by a downstream task.<n>We develop methods to solve FSR by leveraging a pretrained Vision-Language Model (VLM)
arXiv Detail & Related papers (2024-06-17T02:27:14Z) - Jumpstarting Surgical Computer Vision [2.585559512929966]
We develop recommendations for pretraining dataset composition through over 300 experiments.<n>We outperform state-of-the-art pre-trainings on two public benchmarks for phase recognition.
arXiv Detail & Related papers (2023-12-10T18:54:16Z) - One-shot skill assessment in high-stakes domains with limited data via meta learning [0.0]
A-VBANet is a novel meta-learning model capable of delivering domain-agnostic skill assessment via one-shot learning.
Our model successfully adapted with accuracies up to 99.5% in one-shot and 99.9% in few-shot settings for simulated tasks and 89.7% for laparoscopic cholecystectomy.
arXiv Detail & Related papers (2022-12-16T01:04:52Z) - Consecutive Pretraining: A Knowledge Transfer Learning Strategy with
Relevant Unlabeled Data for Remote Sensing Domain [25.84756140221655]
ConSecutive PreTraining (CSPT) is proposed based on the idea of not stopping pretraining in natural language processing (NLP)
The proposed CSPT also can release the huge potential of unlabeled data for task-aware model training.
The results show that by utilizing the proposed CSPT for task-aware model training, almost all downstream tasks in RSD can outperform the previous method of supervised pretraining-then-fine-tuning.
arXiv Detail & Related papers (2022-07-08T12:32:09Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - Understanding Cross-Domain Few-Shot Learning: An Experimental Study [17.81177649496765]
Cross-domain few-shot learning has drawn increasing attention for handling large differences between the source and target domains.
Recent works have considered exploiting small-scale unlabeled data from the target domain during the pre-training stage.
This data enables self-supervised pre-training on the target domain, in addition to supervised pre-training on the source domain.
arXiv Detail & Related papers (2022-02-01T12:35:25Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - Source-Free Open Compound Domain Adaptation in Semantic Segmentation [99.82890571842603]
In SF-OCDA, only the source pre-trained model and the target data are available to learn the target model.
We propose the Cross-Patch Style Swap (CPSS) to diversify samples with various patch styles in the feature-level.
Our method produces state-of-the-art results on the C-Driving dataset.
arXiv Detail & Related papers (2021-06-07T08:38:41Z) - Unsupervised Domain Adaptation for Speech Recognition via Uncertainty
Driven Self-Training [55.824641135682725]
Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD show that up to 80% of the performance of a system trained on ground-truth data can be recovered.
arXiv Detail & Related papers (2020-11-26T18:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.