GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos
- URL: http://arxiv.org/abs/2307.11081v1
- Date: Thu, 20 Jul 2023 17:57:04 GMT
- Title: GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos
- Authors: Nisarg A. Shah, Shameema Sikder, S. Swaroop Vedula, Vishal M. Patel
- Abstract summary: We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
- Score: 57.93194315839009
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Automated surgical step recognition is an important task that can
significantly improve patient safety and decision-making during surgeries.
Existing state-of-the-art methods for surgical step recognition either rely on
separate, multi-stage modeling of spatial and temporal information or operate
on short-range temporal resolution when learned jointly. However, the benefits
of joint modeling of spatio-temporal features and long-range information are
not taken in account. In this paper, we propose a vision transformer-based
approach to jointly learn spatio-temporal features directly from sequence of
frame-level patches. Our method incorporates a gated-temporal attention
mechanism that intelligently combines short-term and long-term spatio-temporal
feature representations. We extensively evaluate our approach on two cataract
surgery video datasets, namely Cataract-101 and D99, and demonstrate superior
performance compared to various state-of-the-art methods. These results
validate the suitability of our proposed approach for automated surgical step
recognition. Our code is released at:
https://github.com/nisargshah1999/GLSFormer
Related papers
- MuST: Multi-Scale Transformers for Surgical Phase Recognition [40.047145788604716]
Phase recognition in surgical videos is crucial for enhancing computer-aided surgical systems.
Existing methods often rely on fixed temporal windows for video analysis to identify dynamic surgical phases.
We propose Multi-Scale Transformers for Surgical Phase Recognition (MuST), a novel Transformer-based approach.
arXiv Detail & Related papers (2024-07-24T15:38:20Z) - Friends Across Time: Multi-Scale Action Segmentation Transformer for
Surgical Phase Recognition [2.10407185597278]
We propose the Multi-Scale Action Transformer (MS-AST) for offline surgical phase recognition and the Multi-Scale Action Causal Transformer (MS-ASCT) for online surgical phase recognition.
Our method can achieve 95.26% and 96.15% accuracy on the Cholec80 dataset for online and offline surgical phase recognition, respectively.
arXiv Detail & Related papers (2024-01-22T01:34:03Z) - TUNeS: A Temporal U-Net with Self-Attention for Video-based Surgical Phase Recognition [1.5237530964650965]
We propose TUNeS, an efficient and simple temporal model that incorporates self-attention at the core of a convolutional U-Net structure.
In our experiments, almost all temporal models performed better on top of feature extractors that were trained with longer temporal context.
arXiv Detail & Related papers (2023-07-19T14:10:55Z) - LoViT: Long Video Transformer for Surgical Phase Recognition [59.06812739441785]
We present a two-stage method, called Long Video Transformer (LoViT) for fusing short- and long-term temporal information.
Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently.
arXiv Detail & Related papers (2023-05-15T20:06:14Z) - Robotic Navigation Autonomy for Subretinal Injection via Intelligent
Real-Time Virtual iOCT Volume Slicing [88.99939660183881]
We propose a framework for autonomous robotic navigation for subretinal injection.
Our method consists of an instrument pose estimation method, an online registration between the robotic and the i OCT system, and trajectory planning tailored for navigation to an injection target.
Our experiments on ex-vivo porcine eyes demonstrate the precision and repeatability of the method.
arXiv Detail & Related papers (2023-01-17T21:41:21Z) - Efficient Global-Local Memory for Real-time Instrument Segmentation of
Robotic Surgical Video [53.14186293442669]
We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration.
We propose a novel dual-memory network (DMNet) to relate both global and local-temporal knowledge.
Our method largely outperforms the state-of-the-art works on segmentation accuracy while maintaining a real-time speed.
arXiv Detail & Related papers (2021-09-28T10:10:14Z) - Temporal Memory Relation Network for Workflow Recognition from Surgical
Video [53.20825496640025]
We propose a novel end-to-end temporal memory relation network (TMNet) for relating long-range and multi-scale temporal patterns.
We have extensively validated our approach on two benchmark surgical video datasets.
arXiv Detail & Related papers (2021-03-30T13:20:26Z) - Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid
Embedding Aggregation Transformer [57.18185972461453]
We introduce for the first time in surgical workflow analysis Transformer to reconsider the ignored complementary effects of spatial and temporal features for accurate phase recognition.
Our framework is lightweight and processes the hybrid embeddings in parallel to achieve a high inference speed.
arXiv Detail & Related papers (2021-03-17T15:12:55Z) - Symmetric Dilated Convolution for Surgical Gesture Recognition [10.699258974625073]
We propose a novel temporal convolutional architecture to automatically detect and segment surgical gestures.
We devise our method with a symmetric dilation structure bridged by a self-attention module to encode and decode the long-term temporal patterns.
We validate our approach on a fundamental robotic suturing task from the JIGSAWS dataset.
arXiv Detail & Related papers (2020-07-13T13:34:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.