Related papers: FASL-Seg: Anatomy and Tool Segmentation of Surgical Scenes

FASL-Seg: Anatomy and Tool Segmentation of Surgical Scenes

URL: http://arxiv.org/abs/2509.06159v3
Date: Thu, 30 Oct 2025 08:10:05 GMT
Title: FASL-Seg: Anatomy and Tool Segmentation of Surgical Scenes
Authors: Muraam Abdel-Ghani, Mahmoud Ali, Mohamed Ali, Fatmaelzahraa Ahmed, Muhammad Arsalan, Abdulaziz Al-Ali, Shidin Balakrishnan,
Abstract summary: We propose a Feature-Adaptive Spatial localization model (FASL-Seg)<n>It is designed to capture features at multiple levels of detail through two distinct processing streams.<n>It is tested on surgical segmentation benchmark datasets EndoVis18 and EndoVis17.<n>FASL-Seg achieves a mean Intersection over Union (mIoU) of 72.71% on parts and anatomy segmentation in EndoVis18, improving on SOTA by 5%.
Score: 7.04219830147424
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The growing popularity of robotic minimally invasive surgeries has made deep learning-based surgical training a key area of research. A thorough understanding of the surgical scene components is crucial, which semantic segmentation models can help achieve. However, most existing work focuses on surgical tools and overlooks anatomical objects. Additionally, current state-of-the-art (SOTA) models struggle to balance capturing high-level contextual features and low-level edge features. We propose a Feature-Adaptive Spatial Localization model (FASL-Seg), designed to capture features at multiple levels of detail through two distinct processing streams, namely a Low-Level Feature Projection (LLFP) and a High-Level Feature Projection (HLFP) stream, for varying feature resolutions - enabling precise segmentation of anatomy and surgical instruments. We evaluated FASL-Seg on surgical segmentation benchmark datasets EndoVis18 and EndoVis17 on three use cases. The FASL-Seg model achieves a mean Intersection over Union (mIoU) of 72.71% on parts and anatomy segmentation in EndoVis18, improving on SOTA by 5%. It further achieves a mIoU of 85.61% and 72.78% in EndoVis18 and EndoVis17 tool type segmentation, respectively, outperforming SOTA overall performance, with comparable per-class SOTA results in both datasets and consistent performance in various classes for anatomy and instruments, demonstrating the effectiveness of distinct processing streams for varying feature resolutions.

Related papers

UniSurg: A Video-Native Foundation Model for Universal Understanding of Surgical Videos [81.9180187964947]
We present UniSurg, a foundation model that shifts the learning paradigm from pixel-level reconstruction to latent motion prediction.<n>To enable large-scale pretraining, we curate the largest surgical video dataset to date, comprising 3,658 hours of video from 50 sources across 13 anatomical regions.<n>These results establish UniSurg as a new standard for universal, motion-oriented surgical video understanding.
arXiv Detail & Related papers (2026-02-05T13:18:33Z)
VesSAM: Efficient Multi-Prompting for Segmenting Complex Vessel [68.24765319399286]
We present VesSAM, a powerful and efficient framework tailored for 2D vessel segmentation.<n>VesSAM integrates (1) a convolutional adapter to enhance local texture features, (2) a multi-prompt encoder that fuses anatomical prompts, and (3) a lightweight mask decoder to reduce jagged artifacts.<n>VesSAM consistently outperforms state-of-the-art PEFT-based SAM variants by over 10% Dice and 13% IoU.
arXiv Detail & Related papers (2025-11-02T15:47:05Z)
Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis [4.318540086708654]
We present a dataset of 3,000 cataract surgery videos from two surgical centers, performed by surgeons with a range of experience levels.<n>This resource is enriched with four annotation layers: temporal surgical phases, instance segmentation of instruments and anatomical structures, instrument-tissue interaction tracking, and quantitative skill scores.<n>The technical quality of the dataset is supported by a series of benchmarking experiments for key surgical AI tasks.
arXiv Detail & Related papers (2025-10-18T06:48:29Z)
SurgVLM: A Large Vision-Language Model and Systematic Evaluation Benchmark for Surgical Intelligence [72.10889173696928]
We propose SurgVLM, one of the first large vision-language foundation models for surgical intelligence.<n>We construct a large-scale multimodal surgical database, SurgVLM-DB, spanning more than 16 surgical types and 18 anatomical structures.<n>Building upon this comprehensive dataset, we propose SurgVLM, which is built upon Qwen2.5-VL, and undergoes instruction tuning to 10+ surgical tasks.
arXiv Detail & Related papers (2025-06-03T07:44:41Z)
SurgXBench: Explainable Vision-Language Model Benchmark for Surgery [4.068223793121694]
Vision-Language Models (VLMs) have brought transformative advances in reasoning across visual and textual modalities.<n>Existing models show limited performance, highlighting the need for benchmark studies to assess their capabilities and limitations.<n>We benchmark the zero-shot performance of several advancedVLMs on two public robotic-assisted laparoscopic datasets for instrument and action classification.
arXiv Detail & Related papers (2025-05-16T00:42:18Z)
Surgical Scene Segmentation by Transformer With Asymmetric Feature Enhancement [7.150163844454341]
Vision-specific transformer method is a promising way for surgical scene understanding. We propose a novel Transformer-based framework with an Asymmetric Feature Enhancement module (TAFE) The proposed method outperforms the SOTA methods in several different surgical segmentation tasks and additionally proves its ability of fine-grained structure recognition.
arXiv Detail & Related papers (2024-10-23T07:58:47Z)
Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images [67.66644395272075]
We present first analysis of state-of-the-art semantic segmentation models when faced with geometric out-of-distribution data. We propose an augmentation technique called "Organ Transplantation" to enhance generalizability. Our augmentation technique improves SOA model performance by up to 67 % for RGB data and 90 % for HSI data, achieving performance at the level of in-distribution performance on real OOD test data.
arXiv Detail & Related papers (2024-08-27T19:13:15Z)
SurgicalPart-SAM: Part-to-Whole Collaborative Prompting for Surgical Instrument Segmentation [66.21356751558011]
The Segment Anything Model (SAM) exhibits promise in generic object segmentation and offers potential for various applications. Existing methods have applied SAM to surgical instrument segmentation (SIS) by tuning SAM-based frameworks with surgical data. We propose SurgicalPart-SAM (SP-SAM), a novel SAM efficient-tuning approach that explicitly integrates instrument structure knowledge with SAM's generic knowledge.
arXiv Detail & Related papers (2023-12-22T07:17:51Z)
Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip Segmentation in Robotic Surgeries [29.201385352740555]
We propose a novel visual-kinematics graph learning framework to accurately segment the instrument tip given various surgical procedures. Specifically, a graph learning framework is proposed to encode relational features of instrument parts from both image and kinematics. A cross-modal contrastive loss is designed to incorporate robust geometric prior from kinematics to image for tip segmentation.
arXiv Detail & Related papers (2023-09-02T14:52:58Z)
Hierarchical Semi-Supervised Learning Framework for Surgical Gesture Segmentation and Recognition Based on Multi-Modality Data [2.8770761243361593]
We develop a hierarchical semi-supervised learning framework for surgical gesture segmentation using multi-modality data. A Transformer-based network with a pre-trained ResNet-18' backbone is used to extract visual features from the surgical operation videos. The proposed approach has been evaluated using data from the publicly available JIGS database, including Suturing, Needle Passing, and Knot Tying tasks.
arXiv Detail & Related papers (2023-07-31T21:17:59Z)
Semantic segmentation of surgical hyperspectral images under geometric domain shifts [69.91792194237212]
We present the first analysis of state-of-the-art semantic segmentation networks in the presence of geometric out-of-distribution (OOD) data. We also address generalizability with a dedicated augmentation technique termed "Organ Transplantation" Our scheme improves on the SOA DSC by up to 67 % (RGB) and 90 % (HSI) and renders performance on par with in-distribution performance on real OOD test data.
arXiv Detail & Related papers (2023-03-20T09:50:07Z)
From Forks to Forceps: A New Framework for Instance Segmentation of Surgical Instruments [6.677634562400846]
Minimally invasive surgeries and related applications demand surgical tool classification and segmentation at the instance level. Our research demonstrates that while the bounding box and segmentation mask are often accurate, the classification head mis-classifies the class label of the surgical instrument. We present a new neural network framework that adds a classification module as a new stage to existing instance segmentation models.
arXiv Detail & Related papers (2022-11-26T21:26:42Z)
Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community. The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.