Phase-Specific Augmented Reality Guidance for Microscopic Cataract
Surgery Using Long-Short Spatiotemporal Aggregation Transformer
- URL: http://arxiv.org/abs/2309.05209v2
- Date: Wed, 1 Nov 2023 02:43:32 GMT
- Title: Phase-Specific Augmented Reality Guidance for Microscopic Cataract
Surgery Using Long-Short Spatiotemporal Aggregation Transformer
- Authors: Puxun Tu, Hongfei Ye, Haochen Shi, Jeff Young, Meng Xie, Peiquan Zhao,
Ce Zheng, Xiaoyi Jiang, Xiaojun Chen
- Abstract summary: Phaemulsification cataract surgery (PCS) is a routine procedure using a surgical microscope.
PCS guidance systems extract valuable information from surgical microscopic videos to enhance proficiency.
Existing PCS guidance systems suffer from non-phasespecific guidance, leading to redundant visual information.
We propose a novel phase-specific augmented reality (AR) guidance system, which offers tailored AR information corresponding to the recognized surgical phase.
- Score: 14.568834378003707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Phacoemulsification cataract surgery (PCS) is a routine procedure conducted
using a surgical microscope, heavily reliant on the skill of the
ophthalmologist. While existing PCS guidance systems extract valuable
information from surgical microscopic videos to enhance intraoperative
proficiency, they suffer from non-phasespecific guidance, leading to redundant
visual information. In this study, our major contribution is the development of
a novel phase-specific augmented reality (AR) guidance system, which offers
tailored AR information corresponding to the recognized surgical phase.
Leveraging the inherent quasi-standardized nature of PCS procedures, we propose
a two-stage surgical microscopic video recognition network. In the first stage,
we implement a multi-task learning structure to segment the surgical limbus
region and extract limbus region-focused spatial feature for each frame. In the
second stage, we propose the long-short spatiotemporal aggregation transformer
(LS-SAT) network to model local fine-grained and global temporal relationships,
and combine the extracted spatial features to recognize the current surgical
phase. Additionally, we collaborate closely with ophthalmologists to design AR
visual cues by utilizing techniques such as limbus ellipse fitting and regional
restricted normal cross-correlation rotation computation. We evaluated the
network on publicly available and in-house datasets, with comparison results
demonstrating its superior performance compared to related works. Ablation
results further validated the effectiveness of the limbus region-focused
spatial feature extractor and the combination of temporal features.
Furthermore, the developed system was evaluated in a clinical setup, with
results indicating remarkable accuracy and real-time performance. underscoring
its potential for clinical applications.
Related papers
- RGB to Hyperspectral: Spectral Reconstruction for Enhanced Surgical Imaging [7.2993064695496255]
This study investigates the reconstruction of hyperspectral signatures from RGB data to enhance surgical imaging.
Various architectures based on convolutional neural networks (CNNs) and transformer models are evaluated.
Transformer models exhibit superior performance in terms of RMSE, SAM, PSNR and SSIM.
arXiv Detail & Related papers (2024-10-17T14:05:41Z) - EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting [53.38166294158047]
EndoGSLAM is an efficient approach for endoscopic surgeries, which integrates streamlined representation and differentiable Gaussianization.
Experiments show that EndoGSLAM achieves a better trade-off between intraoperative availability and reconstruction quality than traditional or neural SLAM approaches.
arXiv Detail & Related papers (2024-03-22T11:27:43Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - Monocular Microscope to CT Registration using Pose Estimation of the
Incus for Augmented Reality Cochlear Implant Surgery [3.8909273404657556]
We develop a method that permits direct 2D-to-3D registration of the view microscope video to the pre-operative Computed Tomography (CT) scan without the need for external tracking equipment.
Our results demonstrate the accuracy with an average rotation error of less than 25 degrees and a translation error of less than 2 mm, 3 mm, and 0.55% for the x, y, and z axes, respectively.
arXiv Detail & Related papers (2024-03-12T00:26:08Z) - Surgical Temporal Action-aware Network with Sequence Regularization for
Phase Recognition [28.52533700429284]
We propose a Surgical Temporal Action-aware Network with sequence Regularization, named STAR-Net, to recognize surgical phases more accurately from input videos.
MS-STA module integrates visual features with spatial and temporal knowledge of surgical actions at the cost of 2D networks.
Our STAR-Net with MS-STA and DSR can exploit visual features of surgical actions with effective regularization, thereby leading to the superior performance of surgical phase recognition.
arXiv Detail & Related papers (2023-11-21T13:43:16Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - LoViT: Long Video Transformer for Surgical Phase Recognition [59.06812739441785]
We present a two-stage method, called Long Video Transformer (LoViT) for fusing short- and long-term temporal information.
Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently.
arXiv Detail & Related papers (2023-05-15T20:06:14Z) - Real-time landmark detection for precise endoscopic submucosal
dissection via shape-aware relation network [51.44506007844284]
We propose a shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection surgery.
We first devise an algorithm to automatically generate relation keypoint heatmaps, which intuitively represent the prior knowledge of spatial relations among landmarks.
We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process.
arXiv Detail & Related papers (2021-11-08T07:57:30Z) - Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid
Embedding Aggregation Transformer [57.18185972461453]
We introduce for the first time in surgical workflow analysis Transformer to reconsider the ignored complementary effects of spatial and temporal features for accurate phase recognition.
Our framework is lightweight and processes the hybrid embeddings in parallel to achieve a high inference speed.
arXiv Detail & Related papers (2021-03-17T15:12:55Z) - TeCNO: Surgical Phase Recognition with Multi-Stage Temporal
Convolutional Networks [43.95869213955351]
We propose a Multi-Stage Temporal Convolutional Network (MS-TCN) that performs hierarchical prediction refinement for surgical phase recognition.
Our method is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos with and without the use of additional surgical tool information.
arXiv Detail & Related papers (2020-03-24T10:12:30Z) - Towards Augmented Reality-based Suturing in Monocular Laparoscopic
Training [0.5707453684578819]
The paper proposes an Augmented Reality environment with quantitative and qualitative visual representations to enhance laparoscopic training outcomes performed on a silicone pad.
This is enabled by a multi-task supervised deep neural network which performs multi-class segmentation and depth map prediction.
The network achieves a dice score of 0.67 for surgical needle segmentation, 0.81 for needle holder instrument segmentation and a mean absolute error of 6.5 mm for depth estimation.
arXiv Detail & Related papers (2020-01-19T19:59:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.