Class-Incremental Domain Adaptation with Smoothing and Calibration for
  Surgical Report Generation
        - URL: http://arxiv.org/abs/2107.11091v1
 - Date: Fri, 23 Jul 2021 09:08:26 GMT
 - Title: Class-Incremental Domain Adaptation with Smoothing and Calibration for
  Surgical Report Generation
 - Authors: Mengya Xu, Mobarakol Islam, Chwee Ming Lim, Hongliang Ren
 - Abstract summary: We propose class-incremental domain adaptation (CIDA) to tackle the new classes and domain shift in the target domain to generate surgical reports during robotic surgery.
To generate caption from the extracted feature, curriculum by one-dimensional gaussian smoothing (CBS) is integrated with a multi-layer transformer-based caption prediction model.
We observe that domain invariant feature learning and the well-calibrated network improves the surgical report generation performance in both source and target domain.
 - Score: 12.757176743817277
 - License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
 - Abstract:   Generating surgical reports aimed at surgical scene understanding in
robot-assisted surgery can contribute to documenting entry tasks and
post-operative analysis. Despite the impressive outcome, the deep learning
model degrades the performance when applied to different domains encountering
domain shifts. In addition, there are new instruments and variations in
surgical tissues appeared in robotic surgery. In this work, we propose
class-incremental domain adaptation (CIDA) with a multi-layer transformer-based
model to tackle the new classes and domain shift in the target domain to
generate surgical reports during robotic surgery. To adapt incremental classes
and extract domain invariant features, a class-incremental (CI) learning method
with supervised contrastive (SupCon) loss is incorporated with a feature
extractor. To generate caption from the extracted feature, curriculum by
one-dimensional gaussian smoothing (CBS) is integrated with a multi-layer
transformer-based caption prediction model. CBS smoothes the features embedding
using anti-aliasing and helps the model to learn domain invariant features. We
also adopt label smoothing (LS) to calibrate prediction probability and obtain
better feature representation with both feature extractor and captioning model.
The proposed techniques are empirically evaluated by using the datasets of two
surgical domains, such as nephrectomy operations and transoral robotic surgery.
We observe that domain invariant feature learning and the well-calibrated
network improves the surgical report generation performance in both source and
target domain under domain shift and unseen classes in the manners of one-shot
and few-shot learning. The code is publicly available at
https://github.com/XuMengyaAmy/CIDACaptioning.
 
       
      
        Related papers
        - Surgical Foundation Model Leveraging Compression and Entropy   Maximization for Image-Guided Surgical Assistance [50.486523249499115]
Real-time video understanding is critical to guide procedures in minimally invasive surgery (MIS)<n>We propose Compress-to-Explore (C2E), a novel self-supervised framework to learn compact, informative representations from surgical videos.<n>C2E uses entropy-maximizing decoders to compress images while preserving clinically relevant details, improving encoder performance without labeled data.
arXiv  Detail & Related papers  (2025-05-16T14:02:24Z) - UWarp: A Whole Slide Image Registration Pipeline to Characterize   Scanner-Induced Local Domain Shift [0.9137449870737363]
We present a domain shift analysis framework based on UWarp, a novel registration tool to align histological slides scanned under varying conditions.
Experiments demonstrate that UWarp outperforms existing open-source registration methods, achieving a median target registration error (TRE) of less than 4 pixels.
We apply UWarp to characterize scanner-induced local domain shift in the predictions of Breast-NEOprAIdict, a deep learning model for breast cancer pathological response prediction.
arXiv  Detail & Related papers  (2025-03-26T15:48:38Z) - Surgical Scene Segmentation by Transformer With Asymmetric Feature   Enhancement [7.150163844454341]
Vision-specific transformer method is a promising way for surgical scene understanding.
We propose a novel Transformer-based framework with an Asymmetric Feature Enhancement module (TAFE)
The proposed method outperforms the SOTA methods in several different surgical segmentation tasks and additionally proves its ability of fine-grained structure recognition.
arXiv  Detail & Related papers  (2024-10-23T07:58:47Z) - GS-EMA: Integrating Gradient Surgery Exponential Moving Average with
  Boundary-Aware Contrastive Learning for Enhanced Domain Generalization in
  Aneurysm Segmentation [41.97669338211682]
We propose a novel domain generalization strategy that employs gradient surgery exponential moving average (GS-EMA) optimization technique and boundary-aware contrastive learning (BACL)
Our approach is distinct in its ability to adapt to new, unseen domains by learning domain-invariant features, thereby improving the robustness and accuracy of aneurysm segmentation across diverse clinical datasets.
arXiv  Detail & Related papers  (2024-02-23T10:02:15Z) - Cross-Dataset Adaptation for Instrument Classification in Cataract
  Surgery Videos [54.1843419649895]
State-of-the-art models, which perform this task well on a particular dataset, perform poorly when tested on another dataset.
We propose a novel end-to-end Unsupervised Domain Adaptation (UDA) method called the Barlow Adaptor.
In addition, we introduce a novel loss called the Barlow Feature Alignment Loss (BFAL) which aligns features across different domains.
arXiv  Detail & Related papers  (2023-07-31T18:14:18Z) - Semantic segmentation of surgical hyperspectral images under geometric
  domain shifts [69.91792194237212]
We present the first analysis of state-of-the-art semantic segmentation networks in the presence of geometric out-of-distribution (OOD) data.
We also address generalizability with a dedicated augmentation technique termed "Organ Transplantation"
Our scheme improves on the SOA DSC by up to 67 % (RGB) and 90 % (HSI) and renders performance on par with in-distribution performance on real OOD test data.
arXiv  Detail & Related papers  (2023-03-20T09:50:07Z) - Task-Aware Asynchronous Multi-Task Model with Class Incremental
  Contrastive Learning for Surgical Scene Understanding [17.80234074699157]
A multi-task learning model is proposed for surgical report generation and tool-tissue interaction prediction.
The model forms of shared feature extractor, mesh-transformer branch for captioning and graph attention branch for tool-tissue interaction prediction.
We incorporate a task-aware asynchronous MTL optimization technique to fine-tune the shared weights and converge both tasks optimally.
arXiv  Detail & Related papers  (2022-11-28T14:08:48Z) - Adapting the Mean Teacher for keypoint-based lung registration under
  geometric domain shifts [75.51482952586773]
deep neural networks generally require plenty of labeled training data and are vulnerable to domain shifts between training and test data.
We present a novel approach to geometric domain adaptation for image registration, adapting a model from a labeled source to an unlabeled target domain.
Our method consistently improves on the baseline model by 50%/47% while even matching the accuracy of models trained on target data.
arXiv  Detail & Related papers  (2022-07-01T12:16:42Z) - Surgical Gesture Recognition Based on Bidirectional Multi-Layer
  Independently RNN with Explainable Spatial Feature Extraction [10.469989981471254]
We aim to develop an effective surgical gesture recognition approach with an explainable feature extraction process.
A Bidirectional Multi-Layer independently RNN (BML-indRNN) model is proposed in this paper.
To eliminate the black-box effects of DCNN, Gradient-weighted Class Activation Mapping (Grad-CAM) is employed.
Results indicated that the testing accuracy for the suturing task based on our proposed method is 87.13%, which outperforms most of the state-of-the-art algorithms.
arXiv  Detail & Related papers  (2021-05-02T12:47:19Z) - One to Many: Adaptive Instrument Segmentation via Meta Learning and
  Dynamic Online Adaptation in Robotic Surgical Video [71.43912903508765]
MDAL is a dynamic online adaptive learning scheme for instrument segmentation in robot-assisted surgery.
It learns the general knowledge of instruments and the fast adaptation ability through the video-specific meta-learning paradigm.
It outperforms other state-of-the-art methods on two datasets.
arXiv  Detail & Related papers  (2021-03-24T05:02:18Z) - Co-Generation and Segmentation for Generalized Surgical Instrument
  Segmentation on Unlabelled Data [49.419268399590045]
Surgical instrument segmentation for robot-assisted surgery is needed for accurate instrument tracking and augmented reality overlays.
Deep learning-based methods have shown state-of-the-art performance for surgical instrument segmentation, but their results depend on labelled data.
In this paper, we demonstrate the limited generalizability of these methods on different datasets, including human robot-assisted surgeries.
arXiv  Detail & Related papers  (2021-03-16T18:41:18Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
  Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv  Detail & Related papers  (2021-03-06T09:10:03Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.