Friends Across Time: Multi-Scale Action Segmentation Transformer for
Surgical Phase Recognition
- URL: http://arxiv.org/abs/2401.11644v1
- Date: Mon, 22 Jan 2024 01:34:03 GMT
- Title: Friends Across Time: Multi-Scale Action Segmentation Transformer for
Surgical Phase Recognition
- Authors: Bokai Zhang, Jiayuan Meng, Bin Cheng, Dean Biskup, Svetlana
Petculescu, Angela Chapman
- Abstract summary: We propose the Multi-Scale Action Transformer (MS-AST) for offline surgical phase recognition and the Multi-Scale Action Causal Transformer (MS-ASCT) for online surgical phase recognition.
Our method can achieve 95.26% and 96.15% accuracy on the Cholec80 dataset for online and offline surgical phase recognition, respectively.
- Score: 2.10407185597278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic surgical phase recognition is a core technology for modern
operating rooms and online surgical video assessment platforms. Current
state-of-the-art methods use both spatial and temporal information to tackle
the surgical phase recognition task. Building on this idea, we propose the
Multi-Scale Action Segmentation Transformer (MS-AST) for offline surgical phase
recognition and the Multi-Scale Action Segmentation Causal Transformer
(MS-ASCT) for online surgical phase recognition. We use ResNet50 or
EfficientNetV2-M for spatial feature extraction. Our MS-AST and MS-ASCT can
model temporal information at different scales with multi-scale temporal
self-attention and multi-scale temporal cross-attention, which enhances the
capture of temporal relationships between frames and segments. We demonstrate
that our method can achieve 95.26% and 96.15% accuracy on the Cholec80 dataset
for online and offline surgical phase recognition, respectively, which achieves
new state-of-the-art results. Our method can also achieve state-of-the-art
results on non-medical datasets in the video action segmentation domain.
Related papers
- MuST: Multi-Scale Transformers for Surgical Phase Recognition [40.047145788604716]
Phase recognition in surgical videos is crucial for enhancing computer-aided surgical systems.
Existing methods often rely on fixed temporal windows for video analysis to identify dynamic surgical phases.
We propose Multi-Scale Transformers for Surgical Phase Recognition (MuST), a novel Transformer-based approach.
arXiv Detail & Related papers (2024-07-24T15:38:20Z) - SAR-RARP50: Segmentation of surgical instrumentation and Action
Recognition on Robot-Assisted Radical Prostatectomy Challenge [72.97934765570069]
We release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP)
The aim of the challenge is to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain.
A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation.
arXiv Detail & Related papers (2023-12-31T13:32:18Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - LoViT: Long Video Transformer for Surgical Phase Recognition [59.06812739441785]
We present a two-stage method, called Long Video Transformer (LoViT) for fusing short- and long-term temporal information.
Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently.
arXiv Detail & Related papers (2023-05-15T20:06:14Z) - Surgical Phase Recognition in Laparoscopic Cholecystectomy [57.929132269036245]
We propose a Transformer-based method that utilizes calibrated confidence scores for a 2-stage inference pipeline.
Our method outperforms the baseline model on the Cholec80 dataset, and can be applied to a variety of action segmentation methods.
arXiv Detail & Related papers (2022-06-14T22:55:31Z) - Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid
Embedding Aggregation Transformer [57.18185972461453]
We introduce for the first time in surgical workflow analysis Transformer to reconsider the ignored complementary effects of spatial and temporal features for accurate phase recognition.
Our framework is lightweight and processes the hybrid embeddings in parallel to achieve a high inference speed.
arXiv Detail & Related papers (2021-03-17T15:12:55Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - TeCNO: Surgical Phase Recognition with Multi-Stage Temporal
Convolutional Networks [43.95869213955351]
We propose a Multi-Stage Temporal Convolutional Network (MS-TCN) that performs hierarchical prediction refinement for surgical phase recognition.
Our method is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos with and without the use of additional surgical tool information.
arXiv Detail & Related papers (2020-03-24T10:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.