Adaptive transfer learning for surgical tool presence detection in laparoscopic videos through gradual freezing fine-tuning
- URL: http://arxiv.org/abs/2510.15372v1
- Date: Fri, 17 Oct 2025 07:17:52 GMT
- Title: Adaptive transfer learning for surgical tool presence detection in laparoscopic videos through gradual freezing fine-tuning
- Authors: Ana Davila, Jacinto Colan, Yasuhisa Hasegawa,
- Abstract summary: Minimally invasive surgery can benefit from automated surgical tool detection, enabling advanced analysis and assistance.<n>The limited availability of annotated data in surgical settings poses a challenge for training robust deep learning models.<n>This paper introduces a novel staged adaptive fine-tuning approach consisting of two steps: a linear probing stage and a gradual freezing stage.
- Score: 1.1371756033920992
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Minimally invasive surgery can benefit significantly from automated surgical tool detection, enabling advanced analysis and assistance. However, the limited availability of annotated data in surgical settings poses a challenge for training robust deep learning models. This paper introduces a novel staged adaptive fine-tuning approach consisting of two steps: a linear probing stage to condition additional classification layers on a pre-trained CNN-based architecture and a gradual freezing stage to dynamically reduce the fine-tunable layers, aiming to regulate adaptation to the surgical domain. This strategy reduces network complexity and improves efficiency, requiring only a single training loop and eliminating the need for multiple iterations. We validated our method on the Cholec80 dataset, employing CNN architectures (ResNet-50 and DenseNet-121) pre-trained on ImageNet for detecting surgical tools in cholecystectomy endoscopic videos. Our results demonstrate that our method improves detection performance compared to existing approaches and established fine-tuning techniques, achieving a mean average precision (mAP) of 96.4%. To assess its broader applicability, the generalizability of the fine-tuning strategy was further confirmed on the CATARACTS dataset, a distinct domain of minimally invasive ophthalmic surgery. These findings suggest that gradual freezing fine-tuning is a promising technique for improving tool presence detection in diverse surgical procedures and may have broader applications in general image classification tasks.
Related papers
- Surgical Foundation Model Leveraging Compression and Entropy Maximization for Image-Guided Surgical Assistance [50.486523249499115]
Real-time video understanding is critical to guide procedures in minimally invasive surgery (MIS)<n>We propose Compress-to-Explore (C2E), a novel self-supervised framework to learn compact, informative representations from surgical videos.<n>C2E uses entropy-maximizing decoders to compress images while preserving clinically relevant details, improving encoder performance without labeled data.
arXiv Detail & Related papers (2025-05-16T14:02:24Z) - $\mathsf{CSMAE~}$:~Cataract Surgical Masked Autoencoder (MAE) based Pre-training [25.71088804562768]
We introduce a Masked Autoencoder (MAE)-based pretraining approach, specifically developed for Cataract Surgery video analysis.<n>Instead of randomly selecting tokens for masking they are selected based on the importance of the token token.<n>This approach surpasses current state-of-the-art self-supervised pretraining and adapter-based learning methods by a significant margin.
arXiv Detail & Related papers (2025-02-12T22:24:49Z) - AMNCutter: Affinity-Attention-Guided Multi-View Normalized Cutter for Unsupervised Surgical Instrument Segmentation [7.594796294925481]
We propose a label-free unsupervised model featuring a novel module named Multi-View Normalized Cutter (m-NCutter)
Our model is trained using a graph-cutting loss function that leverages patch affinities for supervision, eliminating the need for pseudo-labels.
We conduct comprehensive experiments across multiple SIS datasets to validate our approach's state-of-the-art (SOTA) performance, robustness, and exceptional potential as a pre-trained model.
arXiv Detail & Related papers (2024-11-06T06:33:55Z) - Comparison of fine-tuning strategies for transfer learning in medical image classification [2.271776292902496]
Despite availability of advanced pre-trained models, their direct application to medical imaging often falls short due to unique characteristics of medical data.
This study provides a comprehensive analysis on the performance of various fine-tuning methods applied to pre-trained models across a spectrum of medical imaging domains.
arXiv Detail & Related papers (2024-06-14T14:00:02Z) - Learned Image resizing with efficient training (LRET) facilitates
improved performance of large-scale digital histopathology image
classification models [0.0]
Histologic examination plays a crucial role in oncology research and diagnostics.
Current approaches to training deep convolutional neural networks (DCNN) result in suboptimal model performance.
We introduce a novel approach that addresses the main limitations of traditional histopathology classification model training.
arXiv Detail & Related papers (2024-01-19T23:45:47Z) - Efficient Deformable Tissue Reconstruction via Orthogonal Neural Plane [58.871015937204255]
We introduce Fast Orthogonal Plane (plane) for the reconstruction of deformable tissues.
We conceptualize surgical procedures as 4D volumes, and break them down into static and dynamic fields comprised of neural planes.
This factorization iscretizes four-dimensional space, leading to a decreased memory usage and faster optimization.
arXiv Detail & Related papers (2023-12-23T13:27:50Z) - Cross-Dataset Adaptation for Instrument Classification in Cataract
Surgery Videos [54.1843419649895]
State-of-the-art models, which perform this task well on a particular dataset, perform poorly when tested on another dataset.
We propose a novel end-to-end Unsupervised Domain Adaptation (UDA) method called the Barlow Adaptor.
In addition, we introduce a novel loss called the Barlow Feature Alignment Loss (BFAL) which aligns features across different domains.
arXiv Detail & Related papers (2023-07-31T18:14:18Z) - Taxonomy Adaptive Cross-Domain Adaptation in Medical Imaging via
Optimization Trajectory Distillation [73.83178465971552]
The success of automated medical image analysis depends on large-scale and expert-annotated training sets.
Unsupervised domain adaptation (UDA) has been raised as a promising approach to alleviate the burden of labeled data collection.
We propose optimization trajectory distillation, a unified approach to address the two technical challenges from a new perspective.
arXiv Detail & Related papers (2023-07-27T08:58:05Z) - Neural LerPlane Representations for Fast 4D Reconstruction of Deformable
Tissues [52.886545681833596]
LerPlane is a novel method for fast and accurate reconstruction of surgical scenes under a single-viewpoint setting.
LerPlane treats surgical procedures as 4D volumes and factorizes them into explicit 2D planes of static and dynamic fields.
LerPlane shares static fields, significantly reducing the workload of dynamic tissue modeling.
arXiv Detail & Related papers (2023-05-31T14:38:35Z) - One-shot skill assessment in high-stakes domains with limited data via meta learning [0.0]
A-VBANet is a novel meta-learning model capable of delivering domain-agnostic skill assessment via one-shot learning.
Our model successfully adapted with accuracies up to 99.5% in one-shot and 99.9% in few-shot settings for simulated tasks and 89.7% for laparoscopic cholecystectomy.
arXiv Detail & Related papers (2022-12-16T01:04:52Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - End-to-End Blind Quality Assessment for Laparoscopic Videos using Neural
Networks [9.481148895837812]
We propose in this paper neural network-based approaches for distortion classification as well as quality prediction.
To train the overall architecture (ResNet and FCNN models), transfer learning and end-to-end learning approaches are investigated.
Experimental results, carried out on a new laparoscopic video quality database, have shown the efficiency of the proposed methods.
arXiv Detail & Related papers (2022-02-09T15:29:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.