Surgical Scene Segmentation by Transformer With Asymmetric Feature Enhancement
- URL: http://arxiv.org/abs/2410.17642v1
- Date: Wed, 23 Oct 2024 07:58:47 GMT
- Title: Surgical Scene Segmentation by Transformer With Asymmetric Feature Enhancement
- Authors: Cheng Yuan, Yutong Ban,
- Abstract summary: Vision-specific transformer method is a promising way for surgical scene understanding.
We propose a novel Transformer-based framework with an Asymmetric Feature Enhancement module (TAFE)
The proposed method outperforms the SOTA methods in several different surgical segmentation tasks and additionally proves its ability of fine-grained structure recognition.
- Score: 7.150163844454341
- License:
- Abstract: Surgical scene segmentation is a fundamental task for robotic-assisted laparoscopic surgery understanding. It often contains various anatomical structures and surgical instruments, where similar local textures and fine-grained structures make the segmentation a difficult task. Vision-specific transformer method is a promising way for surgical scene understanding. However, there are still two main challenges. Firstly, the absence of inner-patch information fusion leads to poor segmentation performance. Secondly, the specific characteristics of anatomy and instruments are not specifically modeled. To tackle the above challenges, we propose a novel Transformer-based framework with an Asymmetric Feature Enhancement module (TAFE), which enhances local information and then actively fuses the improved feature pyramid into the embeddings from transformer encoders by a multi-scale interaction attention strategy. The proposed method outperforms the SOTA methods in several different surgical segmentation tasks and additionally proves its ability of fine-grained structure recognition. Code is available at https://github.com/cyuan-sjtu/ViT-asym.
Related papers
- SurgicalPart-SAM: Part-to-Whole Collaborative Prompting for Surgical Instrument Segmentation [66.21356751558011]
The Segment Anything Model (SAM) exhibits promise in generic object segmentation and offers potential for various applications.
Existing methods have applied SAM to surgical instrument segmentation (SIS) by tuning SAM-based frameworks with surgical data.
We propose SurgicalPart-SAM (SP-SAM), a novel SAM efficient-tuning approach that explicitly integrates instrument structure knowledge with SAM's generic knowledge.
arXiv Detail & Related papers (2023-12-22T07:17:51Z) - Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip
Segmentation in Robotic Surgeries [29.201385352740555]
We propose a novel visual-kinematics graph learning framework to accurately segment the instrument tip given various surgical procedures.
Specifically, a graph learning framework is proposed to encode relational features of instrument parts from both image and kinematics.
A cross-modal contrastive loss is designed to incorporate robust geometric prior from kinematics to image for tip segmentation.
arXiv Detail & Related papers (2023-09-02T14:52:58Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - Text Promptable Surgical Instrument Segmentation with Vision-Language
Models [16.203166812021045]
We propose a novel text promptable surgical instrument segmentation approach to overcome challenges associated with diversity and differentiation of surgical instruments.
We leverage pretrained image and text encoders as our model backbone and design a text promptable mask decoder.
Experiments on several surgical instrument segmentation datasets demonstrate our model's superior performance and promising generalization capability.
arXiv Detail & Related papers (2023-06-15T16:26:20Z) - Transformer-Based Visual Segmentation: A Survey [118.01564082499948]
Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups.
Transformers are a type of neural network based on self-attention originally designed for natural language processing.
Transformers offer robust, unified, and even simpler solutions for various segmentation tasks.
arXiv Detail & Related papers (2023-04-19T17:59:02Z) - Transforming the Interactive Segmentation for Medical Imaging [34.57242805353604]
The goal of this paper is to interactively refine the automatic segmentation on challenging structures that fall behind human performance.
We propose a novel Transformer-based architecture for Interactive (TIS)
Our proposed architecture is composed of Transformer Decoder variants, which naturally fulfills feature comparison with the attention mechanisms.
arXiv Detail & Related papers (2022-08-20T03:28:23Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Continual Hippocampus Segmentation with Transformers [1.2020488155038649]
In clinical settings, where acquisition conditions and patient populations change over time, continual learning is key for ensuring the safe use of deep neural networks.
radiologists prefer to work with segmentation models that outline specific regions-of-interest, for which Transformer-based architectures are gaining traction.
arXiv Detail & Related papers (2022-04-17T16:13:04Z) - TraSeTR: Track-to-Segment Transformer with Contrastive Query for
Instance-level Instrument Segmentation in Robotic Surgery [60.439434751619736]
We propose TraSeTR, a Track-to-Segment Transformer that exploits tracking cues to assist surgical instrument segmentation.
TraSeTR jointly reasons about the instrument type, location, and identity with instance-level predictions.
The effectiveness of our method is demonstrated with state-of-the-art instrument type segmentation results on three public datasets.
arXiv Detail & Related papers (2022-02-17T05:52:18Z) - Class-Incremental Domain Adaptation with Smoothing and Calibration for
Surgical Report Generation [12.757176743817277]
We propose class-incremental domain adaptation (CIDA) to tackle the new classes and domain shift in the target domain to generate surgical reports during robotic surgery.
To generate caption from the extracted feature, curriculum by one-dimensional gaussian smoothing (CBS) is integrated with a multi-layer transformer-based caption prediction model.
We observe that domain invariant feature learning and the well-calibrated network improves the surgical report generation performance in both source and target domain.
arXiv Detail & Related papers (2021-07-23T09:08:26Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.