Surgical Temporal Action-aware Network with Sequence Regularization for
Phase Recognition
- URL: http://arxiv.org/abs/2311.12603v2
- Date: Wed, 22 Nov 2023 02:15:51 GMT
- Title: Surgical Temporal Action-aware Network with Sequence Regularization for
Phase Recognition
- Authors: Zhen Chen, Yuhao Zhai, Jun Zhang, Jinqiao Wang
- Abstract summary: We propose a Surgical Temporal Action-aware Network with sequence Regularization, named STAR-Net, to recognize surgical phases more accurately from input videos.
MS-STA module integrates visual features with spatial and temporal knowledge of surgical actions at the cost of 2D networks.
Our STAR-Net with MS-STA and DSR can exploit visual features of surgical actions with effective regularization, thereby leading to the superior performance of surgical phase recognition.
- Score: 28.52533700429284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To assist surgeons in the operating theatre, surgical phase recognition is
critical for developing computer-assisted surgical systems, which requires
comprehensive understanding of surgical videos. Although existing studies made
great progress, there are still two significant limitations worthy of
improvement. First, due to the compromise of resource consumption, frame-wise
visual features are extracted by 2D networks and disregard spatial and temporal
knowledge of surgical actions, which hinders subsequent inter-frame modeling
for phase prediction. Second, these works simply utilize ordinary
classification loss with one-hot phase labels to optimize the phase
predictions, and cannot fully explore surgical videos under inadequate
supervision. To overcome these two limitations, we propose a Surgical Temporal
Action-aware Network with sequence Regularization, named STAR-Net, to recognize
surgical phases more accurately from input videos. Specifically, we propose an
efficient multi-scale surgical temporal action (MS-STA) module, which
integrates visual features with spatial and temporal knowledge of surgical
actions at the cost of 2D networks. Moreover, we devise the dual-classifier
sequence regularization (DSR) to facilitate the training of STAR-Net by the
sequence guidance of an auxiliary classifier with a smaller capacity. Our
STAR-Net with MS-STA and DSR can exploit visual features of surgical actions
with effective regularization, thereby leading to the superior performance of
surgical phase recognition. Extensive experiments on a large-scale gastrectomy
surgery dataset and the public Cholec80 benchmark prove that our STAR-Net
significantly outperforms state-of-the-arts of surgical phase recognition.
Related papers
- Hypergraph-Transformer (HGT) for Interactive Event Prediction in
Laparoscopic and Robotic Surgery [50.3022015601057]
We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video.
We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets.
Our results demonstrate the superiority of our approach compared to unstructured alternatives.
arXiv Detail & Related papers (2024-02-03T00:58:05Z) - Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase
Recognition, and Irregularity Detection [5.47960852753243]
We present the largest cataract surgery video dataset that addresses diverse requisites for constructing computerized surgical workflow analysis.
We validate the quality of annotations by benchmarking the performance of several state-of-the-art neural network architectures.
The dataset and annotations will be publicly available upon acceptance of the paper.
arXiv Detail & Related papers (2023-12-11T10:53:05Z) - Phase-Specific Augmented Reality Guidance for Microscopic Cataract
Surgery Using Long-Short Spatiotemporal Aggregation Transformer [14.568834378003707]
Phaemulsification cataract surgery (PCS) is a routine procedure using a surgical microscope.
PCS guidance systems extract valuable information from surgical microscopic videos to enhance proficiency.
Existing PCS guidance systems suffer from non-phasespecific guidance, leading to redundant visual information.
We propose a novel phase-specific augmented reality (AR) guidance system, which offers tailored AR information corresponding to the recognized surgical phase.
arXiv Detail & Related papers (2023-09-11T02:56:56Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - LoViT: Long Video Transformer for Surgical Phase Recognition [59.06812739441785]
We present a two-stage method, called Long Video Transformer (LoViT) for fusing short- and long-term temporal information.
Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently.
arXiv Detail & Related papers (2023-05-15T20:06:14Z) - Quantification of Robotic Surgeries with Vision-Based Deep Learning [45.165919577877695]
We propose a unified deep learning framework, entitled Roboformer, which operates exclusively on videos recorded during surgery.
We validated our framework on four video-based datasets of two commonly-encountered types of steps within minimally-invasive robotic surgeries.
arXiv Detail & Related papers (2022-05-06T06:08:35Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid
Embedding Aggregation Transformer [57.18185972461453]
We introduce for the first time in surgical workflow analysis Transformer to reconsider the ignored complementary effects of spatial and temporal features for accurate phase recognition.
Our framework is lightweight and processes the hybrid embeddings in parallel to achieve a high inference speed.
arXiv Detail & Related papers (2021-03-17T15:12:55Z) - OperA: Attention-Regularized Transformers for Surgical Phase Recognition [46.72897518687539]
We introduce OperA, a transformer-based model that accurately predicts surgical phases from long video sequences.
OperA is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos, outperforming various state-of-the-art temporal refinement approaches.
arXiv Detail & Related papers (2021-03-05T18:59:14Z) - TeCNO: Surgical Phase Recognition with Multi-Stage Temporal
Convolutional Networks [43.95869213955351]
We propose a Multi-Stage Temporal Convolutional Network (MS-TCN) that performs hierarchical prediction refinement for surgical phase recognition.
Our method is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos with and without the use of additional surgical tool information.
arXiv Detail & Related papers (2020-03-24T10:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.