Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid
Embedding Aggregation Transformer
- URL: http://arxiv.org/abs/2103.09712v1
- Date: Wed, 17 Mar 2021 15:12:55 GMT
- Title: Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid
Embedding Aggregation Transformer
- Authors: Xiaojie Gao, Yueming Jin, Yonghao Long, Qi Dou, Pheng-Ann Heng
- Abstract summary: We introduce for the first time in surgical workflow analysis Transformer to reconsider the ignored complementary effects of spatial and temporal features for accurate phase recognition.
Our framework is lightweight and processes the hybrid embeddings in parallel to achieve a high inference speed.
- Score: 57.18185972461453
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time surgical phase recognition is a fundamental task in modern
operating rooms. Previous works tackle this task relying on architectures
arranged in spatio-temporal order, however, the supportive benefits of
intermediate spatial features are not considered. In this paper, we introduce,
for the first time in surgical workflow analysis, Transformer to reconsider the
ignored complementary effects of spatial and temporal features for accurate
surgical phase recognition. Our hybrid embedding aggregation Transformer fuses
cleverly designed spatial and temporal embeddings by allowing for active
queries based on spatial information from temporal embedding sequences. More
importantly, our framework is lightweight and processes the hybrid embeddings
in parallel to achieve a high inference speed. Our method is thoroughly
validated on two large surgical video datasets, i.e., Cholec80 and M2CAI16
Challenge datasets, and significantly outperforms the state-of-the-art
approaches at a processing speed of 91 fps.
Related papers
- Friends Across Time: Multi-Scale Action Segmentation Transformer for
Surgical Phase Recognition [2.10407185597278]
We propose the Multi-Scale Action Transformer (MS-AST) for offline surgical phase recognition and the Multi-Scale Action Causal Transformer (MS-ASCT) for online surgical phase recognition.
Our method can achieve 95.26% and 96.15% accuracy on the Cholec80 dataset for online and offline surgical phase recognition, respectively.
arXiv Detail & Related papers (2024-01-22T01:34:03Z) - Efficient Deformable Tissue Reconstruction via Orthogonal Neural Plane [58.871015937204255]
We introduce Fast Orthogonal Plane (plane) for the reconstruction of deformable tissues.
We conceptualize surgical procedures as 4D volumes, and break them down into static and dynamic fields comprised of neural planes.
This factorization iscretizes four-dimensional space, leading to a decreased memory usage and faster optimization.
arXiv Detail & Related papers (2023-12-23T13:27:50Z) - SurgPLAN: Surgical Phase Localization Network for Phase Recognition [14.857715124466594]
We propose a Surgical Phase LocAlization Network, named SurgPLAN, to facilitate a more accurate and stable surgical phase recognition.
We first devise a Pyramid SlowFast (PSF) architecture to serve as the visual backbone to capture multi-scale spatial and temporal features by two branches with different frame sampling rates.
arXiv Detail & Related papers (2023-11-16T15:39:01Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - LoViT: Long Video Transformer for Surgical Phase Recognition [59.06812739441785]
We present a two-stage method, called Long Video Transformer (LoViT) for fusing short- and long-term temporal information.
Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently.
arXiv Detail & Related papers (2023-05-15T20:06:14Z) - ARST: Auto-Regressive Surgical Transformer for Phase Recognition from
Laparoscopic Videos [2.973286445527318]
Transformer, originally proposed for sequential data modeling in natural language processing, has been successfully applied to surgical phase recognition.
In this work, an Auto-Regressive Surgical Transformer, referred as ARST, is first proposed for on-line surgical phase recognition from laparoscopic videos.
arXiv Detail & Related papers (2022-09-02T16:05:39Z) - Efficient Global-Local Memory for Real-time Instrument Segmentation of
Robotic Surgical Video [53.14186293442669]
We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration.
We propose a novel dual-memory network (DMNet) to relate both global and local-temporal knowledge.
Our method largely outperforms the state-of-the-art works on segmentation accuracy while maintaining a real-time speed.
arXiv Detail & Related papers (2021-09-28T10:10:14Z) - Temporal Memory Relation Network for Workflow Recognition from Surgical
Video [53.20825496640025]
We propose a novel end-to-end temporal memory relation network (TMNet) for relating long-range and multi-scale temporal patterns.
We have extensively validated our approach on two benchmark surgical video datasets.
arXiv Detail & Related papers (2021-03-30T13:20:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.