Related papers: Sim2Real within 5 Minutes: Efficient Domain Transfer with Stylized Gaussian Splatting for Endoscopic Images

Sim2Real within 5 Minutes: Efficient Domain Transfer with Stylized Gaussian Splatting for Endoscopic Images

URL: http://arxiv.org/abs/2403.10860v2
Date: Wed, 05 Mar 2025 12:41:05 GMT
Title: Sim2Real within 5 Minutes: Efficient Domain Transfer with Stylized Gaussian Splatting for Endoscopic Images
Authors: Junyang Wu, Yun Gu, Guang-Zhong Yang,
Abstract summary: endoluminal intervention is an emerging technique for both benign and malignant luminal lesions.<n>In practice, aligning pre-operative and intra-operative domains is complicated by significant texture differences.<n>This paper proposes an efficient domain transfer method based on stylized Gaussian splatting.
Score: 28.802915155343964
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robot assisted endoluminal intervention is an emerging technique for both benign and malignant luminal lesions. With vision-based navigation, when combined with pre-operative imaging data as priors, it is possible to recover position and pose of the endoscope without the need of additional sensors. In practice, however, aligning pre-operative and intra-operative domains is complicated by significant texture differences. Although methods such as style transfer can be used to address this issue, they require large datasets from both source and target domains with prolonged training times. This paper proposes an efficient domain transfer method based on stylized Gaussian splatting, only requiring a few of real images (10 images) with very fast training time. Specifically, the transfer process includes two phases. In the first phase, the 3D models reconstructed from CT scans are represented as differential Gaussian point clouds. In the second phase, only color appearance related parameters are optimized to transfer the style and preserve the visual content. A novel structure consistency loss is applied to latent features and depth levels to enhance the stability of the transferred images. Detailed validation was performed to demonstrate the performance advantages of the proposed method compared to that of the current state-of-the-art, highlighting the potential for intra-operative surgical navigation.

Related papers

Landmark-Free Preoperative-to-Intraoperative Registration in Laparoscopic Liver Resection [50.388465935739376]
Liver registration by overlaying preoperative 3D models onto intraoperative 2D frames can assist surgeons in perceiving the spatial anatomy of the liver clearly for a higher surgical success rate. Existing registration methods rely heavily on anatomical landmark-based, which encounter two major limitations. We propose a landmark-free preoperative-to-intraoperative registration framework utilizing effective self-supervised learning.
arXiv Detail & Related papers (2025-04-21T14:55:57Z)
Enhanced Feature-based Image Stitching for Endoscopic Videos in Pediatric Eosinophilic Esophagitis [7.634233891270609]
We propose a novel preprocessing pipeline designed to enhance endoscopic image stitching. Our approach converts endoscopic video data into continuous 2D images by following four key steps. Experiments conducted on 20 pediatric endoscopy videos demonstrate that our method significantly improves image alignment and stitching quality.
arXiv Detail & Related papers (2025-02-06T16:47:28Z)
Data-Centric Learning Framework for Real-Time Detection of Aiming Beam in Fluorescence Lifetime Imaging Guided Surgery [3.8261910636994925]
This study introduces a novel data-centric approach to improve real-time surgical guidance using fiber-based fluorescence lifetime imaging (FLIm) The primary challenge arises from the complex and variable conditions encountered in the surgical environment, particularly in Transoral Robotic Surgery (TORS) An instance segmentation model was developed using a data-centric training strategy that improves accuracy by minimizing label noise and enhancing detection robustness.
arXiv Detail & Related papers (2024-11-11T22:04:32Z)
From Real Artifacts to Virtual Reference: A Robust Framework for Translating Endoscopic Images [27.230439605570812]
In endoscopic imaging, combining pre-operative data with intra-operative imaging is important for surgical planning and navigation. Existing domain adaptation methods are hampered by distribution shift caused by in vivo artifacts. This paper presents an artifact-resilient image translation method and an associated benchmark for this purpose.
arXiv Detail & Related papers (2024-10-15T02:41:52Z)
SurgPETL: Parameter-Efficient Image-to-Surgical-Video Transfer Learning for Surgical Phase Recognition [9.675072799670458]
"Image pre-training followed by video fine-tuning" for high-dimensional video data poses significant performance bottlenecks. In this paper, we develop a parameter-efficient transfer learning benchmark SurgPETL for surgical phase recognition. We conduct extensive experiments with three advanced methods based on ViTs of two distinct scales pre-trained on five large-scale natural and medical datasets.
arXiv Detail & Related papers (2024-09-30T08:33:50Z)
Intraoperative Registration by Cross-Modal Inverse Neural Rendering [61.687068931599846]
We present a novel approach for 3D/2D intraoperative registration during neurosurgery via cross-modal inverse neural rendering. Our approach separates implicit neural representation into two components, handling anatomical structure preoperatively and appearance intraoperatively. We tested our method on retrospective patients' data from clinical cases, showing that our method outperforms state-of-the-art while meeting current clinical standards for registration.
arXiv Detail & Related papers (2024-09-18T13:40:59Z)
CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images. The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism. We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z)
Transferring Relative Monocular Depth to Surgical Vision with Temporal Consistency [3.585363618435449]
Relative monocular depth, inferring depth up to shift and scale from a single image, is an active research topic. Recent deep learning models, trained on large and varied meta-datasets, now provide excellent performance in the domain of natural images. Few datasets exist which provide ground truth depth for endoscopic images, making training such models from scratch unfeasible.
arXiv Detail & Related papers (2024-03-11T12:57:51Z)
Efficient Deformable Tissue Reconstruction via Orthogonal Neural Plane [58.871015937204255]
We introduce Fast Orthogonal Plane (plane) for the reconstruction of deformable tissues. We conceptualize surgical procedures as 4D volumes, and break them down into static and dynamic fields comprised of neural planes. This factorization iscretizes four-dimensional space, leading to a decreased memory usage and faster optimization.
arXiv Detail & Related papers (2023-12-23T13:27:50Z)
Synthetic optical coherence tomography angiographs for detailed retinal vessel segmentation without human annotations [12.571349114534597]
We present a lightweight simulation of the retinal vascular network based on space colonization for faster and more realistic OCTA synthesis. We demonstrate the superior segmentation performance of our approach in extensive quantitative and qualitative experiments on three public datasets.
arXiv Detail & Related papers (2023-06-19T14:01:47Z)
Unsupervised Domain Transfer with Conditional Invertible Neural Networks [83.90291882730925]
We propose a domain transfer approach based on conditional invertible neural networks (cINNs) Our method inherently guarantees cycle consistency through its invertible architecture, and network training can efficiently be conducted with maximum likelihood. Our method enables the generation of realistic spectral data and outperforms the state of the art on two downstream classification tasks.
arXiv Detail & Related papers (2023-03-17T18:00:27Z)
Self-Supervised-RCNN for Medical Image Segmentation with Limited Data Annotation [0.16490701092527607]
We propose an alternative deep learning training strategy based on self-supervised pretraining on unlabeled MRI scans. Our pretraining approach first, randomly applies different distortions to random areas of unlabeled images and then predicts the type of distortions and loss of information. The effectiveness of the proposed method for segmentation tasks in different pre-training and fine-tuning scenarios is evaluated.
arXiv Detail & Related papers (2022-07-17T13:28:52Z)
DLTTA: Dynamic Learning Rate for Test-time Adaptation on Cross-domain Medical Images [56.72015587067494]
We propose a novel dynamic learning rate adjustment method for test-time adaptation, called DLTTA. Our method achieves effective and fast test-time adaptation with consistent performance improvement over current state-of-the-art test-time adaptation methods.
arXiv Detail & Related papers (2022-05-27T02:34:32Z)
A Temporal Learning Approach to Inpainting Endoscopic Specularities and Its effect on Image Correspondence [13.25903945009516]
We propose using a temporal generative adversarial network (GAN) to inpaint the hidden anatomy under specularities. This is achieved using in-vivo data of gastric endoscopy (Hyper-Kvasir) in a fully unsupervised manner. We also assess the effect of our method in computer vision tasks that underpin 3D reconstruction and camera motion estimation.
arXiv Detail & Related papers (2022-03-31T13:14:00Z)
Real-time landmark detection for precise endoscopic submucosal dissection via shape-aware relation network [51.44506007844284]
We propose a shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection surgery. We first devise an algorithm to automatically generate relation keypoint heatmaps, which intuitively represent the prior knowledge of spatial relations among landmarks. We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process.
arXiv Detail & Related papers (2021-11-08T07:57:30Z)
Adversarial Domain Feature Adaptation for Bronchoscopic Depth Estimation [111.89519571205778]
In this work, we propose an alternative domain-adaptive approach to depth estimation. Our novel two-step structure first trains a depth estimation network with labeled synthetic images in a supervised manner. The results of our experiments show that the proposed method improves the network's performance on real images by a considerable margin.
arXiv Detail & Related papers (2021-09-24T08:11:34Z)
Self-Supervised Learning from Unlabeled Fundus Photographs Improves Segmentation of the Retina [4.815051667870375]
Fundus photography is the primary method for retinal imaging and essential for diabetic retinopathy prevention. Current segmentation methods are not robust towards the diversity in imaging conditions and pathologies typical for real-world clinical applications. We utilize contrastive self-supervised learning to exploit the large variety of unlabeled fundus images in the publicly available EyePACS dataset.
arXiv Detail & Related papers (2021-08-05T18:02:56Z)
Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid Embedding Aggregation Transformer [57.18185972461453]
We introduce for the first time in surgical workflow analysis Transformer to reconsider the ignored complementary effects of spatial and temporal features for accurate phase recognition. Our framework is lightweight and processes the hybrid embeddings in parallel to achieve a high inference speed.
arXiv Detail & Related papers (2021-03-17T15:12:55Z)
Searching for Efficient Architecture for Instrument Segmentation in Robotic Surgery [58.63306322525082]
Most applications rely on accurate real-time segmentation of high-resolution surgical images. We design a light-weight and highly-efficient deep residual architecture which is tuned to perform real-time inference of high-resolution images.
arXiv Detail & Related papers (2020-07-08T21:38:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.