Multi-Stage Boundary-Aware Transformer Network for Action Segmentation in Untrimmed Surgical Videos
- URL: http://arxiv.org/abs/2504.18756v1
- Date: Sat, 26 Apr 2025 01:07:56 GMT
- Title: Multi-Stage Boundary-Aware Transformer Network for Action Segmentation in Untrimmed Surgical Videos
- Authors: Rezowan Shuvo, M S Mekala, Eyad Elyan,
- Abstract summary: We present the Multi-Stage Boundary-Aware Transformer Network (MSBATN) with hierarchical sliding window attention.<n>Our proposed approach incorporates a novel unified loss function that treats action classification and boundary detection as distinct yet interdependent tasks.<n>Our boundary voting mechanism accurately identifies start and end points by leveraging contextual information.
- Score: 0.1053373860696675
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Understanding actions within surgical workflows is essential for evaluating post-operative outcomes. However, capturing long sequences of actions performed in surgical settings poses challenges, as individual surgeons have their unique approaches shaped by their expertise, leading to significant variability. To tackle this complex problem, we focused on segmentation with precise boundaries, a demanding task due to the inherent variability in action durations and the subtle transitions often observed in untrimmed videos. These transitions, marked by ambiguous starting and ending points, complicate the segmentation process. Traditional models, such as MS-TCN, which depend on large receptive fields, frequently face challenges of over-segmentation (resulting in fragmented segments) or under-segmentation (merging distinct actions). Both of these issues negatively impact the quality of segmentation. To overcome these challenges, we present the Multi-Stage Boundary-Aware Transformer Network (MSBATN) with hierarchical sliding window attention, designed to enhance action segmentation. Our proposed approach incorporates a novel unified loss function that treats action classification and boundary detection as distinct yet interdependent tasks. Unlike traditional binary boundary detection methods, our boundary voting mechanism accurately identifies start and end points by leveraging contextual information. Extensive experiments using three challenging surgical datasets demonstrate the superior performance of the proposed method, achieving state-of-the-art results in F1 scores at thresholds of 25% and 50%, while also delivering comparable performance in other metrics.
Related papers
- WeakSurg: Weakly supervised surgical instrument segmentation using temporal equivariance and semantic continuity [14.448593791011204]
We propose a weakly supervised surgical instrument segmentation with only instrument presence labels.
We take the inherent temporal attributes of surgical video into account and extend a two-stage weakly supervised segmentation paradigm.
Experiments are validated on two surgical video datasets, including one cholecystectomy surgery benchmark and one real robotic left lateral segment liver surgery dataset.
arXiv Detail & Related papers (2024-03-14T16:39:11Z) - GS-EMA: Integrating Gradient Surgery Exponential Moving Average with
Boundary-Aware Contrastive Learning for Enhanced Domain Generalization in
Aneurysm Segmentation [41.97669338211682]
We propose a novel domain generalization strategy that employs gradient surgery exponential moving average (GS-EMA) optimization technique and boundary-aware contrastive learning (BACL)
Our approach is distinct in its ability to adapt to new, unseen domains by learning domain-invariant features, thereby improving the robustness and accuracy of aneurysm segmentation across diverse clinical datasets.
arXiv Detail & Related papers (2024-02-23T10:02:15Z) - Pixel-Wise Recognition for Holistic Surgical Scene Understanding [33.40319680006502]
This paper presents the Holistic and Multi-Granular Surgical Scene Understanding of Prostatectomies dataset.<n>Our benchmark models surgical scene understanding as a hierarchy of complementary tasks with varying levels of granularity.<n>To exploit our proposed benchmark, we introduce the Transformers for Actions, Phases, Steps, and Instrument (TAPIS) model.
arXiv Detail & Related papers (2024-01-20T09:09:52Z) - SAR-RARP50: Segmentation of surgical instrumentation and Action
Recognition on Robot-Assisted Radical Prostatectomy Challenge [72.97934765570069]
We release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP)
The aim of the challenge is to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain.
A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation.
arXiv Detail & Related papers (2023-12-31T13:32:18Z) - DIR-AS: Decoupling Individual Identification and Temporal Reasoning for
Action Segmentation [84.78383981697377]
Fully supervised action segmentation works on frame-wise action recognition with dense annotations and often suffers from the over-segmentation issue.
We develop a novel local-global attention mechanism with temporal pyramid dilation and temporal pyramid pooling for efficient multi-scale attention.
We achieve state-of-the-art accuracy, eg, 82.8% (+2.6%) on GTEA and 74.7% (+1.2%) on Breakfast, which demonstrates the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-04-04T20:27:18Z) - TraSeTR: Track-to-Segment Transformer with Contrastive Query for
Instance-level Instrument Segmentation in Robotic Surgery [60.439434751619736]
We propose TraSeTR, a Track-to-Segment Transformer that exploits tracking cues to assist surgical instrument segmentation.
TraSeTR jointly reasons about the instrument type, location, and identity with instance-level predictions.
The effectiveness of our method is demonstrated with state-of-the-art instrument type segmentation results on three public datasets.
arXiv Detail & Related papers (2022-02-17T05:52:18Z) - MIDeepSeg: Minimally Interactive Segmentation of Unseen Objects from
Medical Images Using Deep Learning [15.01235930304888]
We propose a novel deep learning-based interactive segmentation method that has high efficiency due to only requiring clicks as user inputs.
Our proposed framework achieves accurate results with fewer user interactions and less time compared with state-of-the-art interactive frameworks.
arXiv Detail & Related papers (2021-04-25T14:15:17Z) - MS-TCN++: Multi-Stage Temporal Convolutional Network for Action
Segmentation [87.16030562892537]
We propose a multi-stage architecture for the temporal action segmentation task.
The first stage generates an initial prediction that is refined by the next ones.
Our models achieve state-of-the-art results on three datasets.
arXiv Detail & Related papers (2020-06-16T14:50:47Z) - On Evaluating Weakly Supervised Action Segmentation Methods [79.42955857919497]
We focus on two aspects of the use and evaluation of weakly supervised action segmentation approaches.
We train each method on the Breakfast dataset 5 times and provide average and standard deviation of the results.
Our experiments show that the standard deviation over these repetitions is between 1 and 2.5% and significantly affects the comparison between different approaches.
arXiv Detail & Related papers (2020-05-19T20:30:31Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.