SurgPLAN: Surgical Phase Localization Network for Phase Recognition
- URL: http://arxiv.org/abs/2311.09965v1
- Date: Thu, 16 Nov 2023 15:39:01 GMT
- Title: SurgPLAN: Surgical Phase Localization Network for Phase Recognition
- Authors: Xingjian Luo, You Pang, Zhen Chen, Jinlin Wu, Zongmin Zhang, Zhen Lei,
Hongbin Liu
- Abstract summary: We propose a Surgical Phase LocAlization Network, named SurgPLAN, to facilitate a more accurate and stable surgical phase recognition.
We first devise a Pyramid SlowFast (PSF) architecture to serve as the visual backbone to capture multi-scale spatial and temporal features by two branches with different frame sampling rates.
- Score: 14.857715124466594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surgical phase recognition is crucial to providing surgery understanding in
smart operating rooms. Despite great progress in automatic surgical phase
recognition, most existing methods are still restricted by two problems. First,
these methods cannot capture discriminative visual features for each frame and
motion information with simple 2D networks. Second, the frame-by-frame
recognition paradigm degrades the performance due to unstable predictions
within each phase, termed as phase shaking. To address these two challenges, we
propose a Surgical Phase LocAlization Network, named SurgPLAN, to facilitate a
more accurate and stable surgical phase recognition with the principle of
temporal detection. Specifically, we first devise a Pyramid SlowFast (PSF)
architecture to serve as the visual backbone to capture multi-scale spatial and
temporal features by two branches with different frame sampling rates.
Moreover, we propose a Temporal Phase Localization (TPL) module to generate the
phase prediction based on temporal region proposals, which ensures accurate and
consistent predictions within each surgical phase. Extensive experiments
confirm the significant advantages of our SurgPLAN over frame-by-frame
approaches in terms of both accuracy and stability.
Related papers
- Friends Across Time: Multi-Scale Action Segmentation Transformer for
Surgical Phase Recognition [2.10407185597278]
We propose the Multi-Scale Action Transformer (MS-AST) for offline surgical phase recognition and the Multi-Scale Action Causal Transformer (MS-ASCT) for online surgical phase recognition.
Our method can achieve 95.26% and 96.15% accuracy on the Cholec80 dataset for online and offline surgical phase recognition, respectively.
arXiv Detail & Related papers (2024-01-22T01:34:03Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - LoViT: Long Video Transformer for Surgical Phase Recognition [59.06812739441785]
We present a two-stage method, called Long Video Transformer (LoViT) for fusing short- and long-term temporal information.
Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently.
arXiv Detail & Related papers (2023-05-15T20:06:14Z) - Real-time landmark detection for precise endoscopic submucosal
dissection via shape-aware relation network [51.44506007844284]
We propose a shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection surgery.
We first devise an algorithm to automatically generate relation keypoint heatmaps, which intuitively represent the prior knowledge of spatial relations among landmarks.
We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process.
arXiv Detail & Related papers (2021-11-08T07:57:30Z) - Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid
Embedding Aggregation Transformer [57.18185972461453]
We introduce for the first time in surgical workflow analysis Transformer to reconsider the ignored complementary effects of spatial and temporal features for accurate phase recognition.
Our framework is lightweight and processes the hybrid embeddings in parallel to achieve a high inference speed.
arXiv Detail & Related papers (2021-03-17T15:12:55Z) - TeCNO: Surgical Phase Recognition with Multi-Stage Temporal
Convolutional Networks [43.95869213955351]
We propose a Multi-Stage Temporal Convolutional Network (MS-TCN) that performs hierarchical prediction refinement for surgical phase recognition.
Our method is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos with and without the use of additional surgical tool information.
arXiv Detail & Related papers (2020-03-24T10:12:30Z) - Detecting Pancreatic Ductal Adenocarcinoma in Multi-phase CT Scans via
Alignment Ensemble [77.5625174267105]
Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal cancers among the population.
Multiple phases provide more information than single phase, but they are unaligned and inhomogeneous in texture.
We suggest an ensemble of all these alignments as a promising way to boost the performance of PDAC detection.
arXiv Detail & Related papers (2020-03-18T19:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.