Towards Robust Algorithms for Surgical Phase Recognition via Digital Twin-based Scene Representation
- URL: http://arxiv.org/abs/2410.20026v1
- Date: Sat, 26 Oct 2024 00:49:06 GMT
- Title: Towards Robust Algorithms for Surgical Phase Recognition via Digital Twin-based Scene Representation
- Authors: Hao Ding, Yuqian Zhang, Hongchao Shu, Xu Lian, Ji Woong Kim, Axel Krieger, Mathias Unberath,
- Abstract summary: End-to-end trained neural networks that predict surgical phase directly from videos have shown excellent performance on benchmarks.
Our goal is to improve model robustness to variations in the surgical videos by leveraging the digital twin (DT) paradigm.
This approach takes advantage of the recent vision foundation models that ensure reliable low-level scene understanding.
- Score: 14.108636146958007
- License:
- Abstract: Purpose: Surgical phase recognition (SPR) is an integral component of surgical data science, enabling high-level surgical analysis. End-to-end trained neural networks that predict surgical phase directly from videos have shown excellent performance on benchmarks. However, these models struggle with robustness due to non-causal associations in the training set, resulting in poor generalizability. Our goal is to improve model robustness to variations in the surgical videos by leveraging the digital twin (DT) paradigm -- an intermediary layer to separate high-level analysis (SPR) from low-level processing (geometric understanding). This approach takes advantage of the recent vision foundation models that ensure reliable low-level scene understanding to craft DT-based scene representations that support various high-level tasks. Methods: We present a DT-based framework for SPR from videos. The framework employs vision foundation models to extract representations. We embed the representation in place of raw video inputs in the state-of-the-art Surgformer model. The framework is trained on the Cholec80 dataset and evaluated on out-of-distribution (OOD) and corrupted test samples. Results: Contrary to the vulnerability of the baseline model, our framework demonstrates strong robustness on both OOD and corrupted samples, with a video-level accuracy of 51.1 on the challenging CRCD dataset, 96.0 on an internal robotics training dataset, and 64.4 on a highly corrupted Cholec80 test set. Conclusion: Our findings lend support to the thesis that DT-based scene representations are effective in enhancing model robustness. Future work will seek to improve the feature informativeness, automate feature extraction, and incorporate interpretability for a more comprehensive framework.
Related papers
- Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images [67.66644395272075]
We present first analysis of state-of-the-art semantic segmentation models when faced with geometric out-of-distribution data.
We propose an augmentation technique called "Organ Transplantation" to enhance generalizability.
Our augmentation technique improves SOA model performance by up to 67 % for RGB data and 90 % for HSI data, achieving performance at the level of in-distribution performance on real OOD test data.
arXiv Detail & Related papers (2024-08-27T19:13:15Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge [20.63421118951673]
Current feed-forward neural network-based methods exhibit excellent segmentation performance under ideal conditions.
SegSTRONG-C challenge aims to promote the development of algorithms robust to unforeseen but plausible image corruptions of surgery.
New benchmark will allow us to carefully study neural network robustness to non-adversarial corruptions of surgery.
arXiv Detail & Related papers (2024-07-16T16:50:43Z) - A quality assurance framework for real-time monitoring of deep learning
segmentation models in radiotherapy [3.5752677591512487]
This work uses cardiac substructure segmentation as an example task to establish a quality assurance framework.
A benchmark dataset consisting of Computed Tomography (CT) images along with manual cardiac delineations of 241 patients was collected.
An image domain shift detector was developed by utilizing a trained Denoising autoencoder (DAE) and two hand-engineered features.
A regression model was trained to predict the per-patient segmentation accuracy, measured by Dice similarity coefficient (DSC)
arXiv Detail & Related papers (2023-05-19T14:51:05Z) - On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model,
Data, and Training [109.9218185711916]
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind social media texts or reviews.
We propose to enhance the ABSA robustness by systematically rethinking the bottlenecks from all possible angles, including model, data, and training.
arXiv Detail & Related papers (2023-04-19T11:07:43Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Lightweight Encoder-Decoder Architecture for Foot Ulcer Segmentation [12.729149322066249]
Continuous monitoring of foot ulcer healing is needed to ensure the efficacy of a given treatment and to avoid any possibility of deterioration.
We developed a model that is similar in spirit to the well-established encoder-decoder and residual convolution neural networks.
A simple patch-based approach for model training, test time augmentations, and majority voting on the obtained predictions resulted in superior performance.
arXiv Detail & Related papers (2022-07-06T08:42:29Z) - Large-scale Robustness Analysis of Video Action Recognition Models [10.017292176162302]
We study robustness of six state-of-the-art action recognition models against 90 different perturbations.
The study reveals some interesting findings, 1) transformer based models are consistently more robust compared to CNN based models, 2) Pretraining improves robustness for Transformer based models more than CNN based models, and 3) All of the studied models are robust to temporal perturbations for all datasets but SSv2.
arXiv Detail & Related papers (2022-07-04T13:29:34Z) - CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms.
Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner.
Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z) - CaRTS: Causality-driven Robot Tool Segmentation from Vision and
Kinematics Data [11.92904350972493]
Vision-based segmentation of the robotic tool during robot-assisted surgery enables downstream applications, such as augmented reality feedback.
With the introduction of deep learning, many methods were presented to solve instrument segmentation directly and solely from images.
We present CaRTS, a causality-driven robot tool segmentation algorithm, that is designed based on a complementary causal model of the robot tool segmentation task.
arXiv Detail & Related papers (2022-03-15T22:26:19Z) - InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal
Artifact Reduction in CT Images [53.4351366246531]
We construct a novel interpretable dual domain network, termed InDuDoNet+, into which CT imaging process is finely embedded.
We analyze the CT values among different tissues, and merge the prior observations into a prior network for our InDuDoNet+, which significantly improve its generalization performance.
arXiv Detail & Related papers (2021-12-23T15:52:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.