Related papers: BronchOpt : Vision-Based Pose Optimization with Fine-Tuned Foundation Models for Accurate Bronchoscopy Navigation

BronchOpt : Vision-Based Pose Optimization with Fine-Tuned Foundation Models for Accurate Bronchoscopy Navigation

URL: http://arxiv.org/abs/2511.09443v1
Date: Thu, 13 Nov 2025 01:54:53 GMT
Title: BronchOpt : Vision-Based Pose Optimization with Fine-Tuned Foundation Models for Accurate Bronchoscopy Navigation
Authors: Hongchao Shu, Roger D. Soberanis-Mukul, Jiru Xu, Hao Ding, Morgan Ringel, Mali Shen, Saif Iftekar Sayed, Hedyeh Rafii-Tari, Mathias Unberath,
Abstract summary: We propose a vision-based pose optimization framework for 2D-3D registration between intra-operative endoscopic views and pre-operative CT anatomy.<n>A fine-tuned modality- and domain-invariant encoder enables direct similarity between real endoscopic RGB frames and CT-rendered depth maps.<n>Our model achieves an average translational error of 2.65 mm and a rotational error of 0.19 rad, demonstrating accurate and stable localization.
Score: 6.915058920280426
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate intra-operative localization of the bronchoscope tip relative to patient anatomy remains challenging due to respiratory motion, anatomical variability, and CT-to-body divergence that cause deformation and misalignment between intra-operative views and pre-operative CT. Existing vision-based methods often fail to generalize across domains and patients, leading to residual alignment errors. This work establishes a generalizable foundation for bronchoscopy navigation through a robust vision-based framework and a new synthetic benchmark dataset that enables standardized and reproducible evaluation. We propose a vision-based pose optimization framework for frame-wise 2D-3D registration between intra-operative endoscopic views and pre-operative CT anatomy. A fine-tuned modality- and domain-invariant encoder enables direct similarity computation between real endoscopic RGB frames and CT-rendered depth maps, while a differentiable rendering module iteratively refines camera poses through depth consistency. To enhance reproducibility, we introduce the first public synthetic benchmark dataset for bronchoscopy navigation, addressing the lack of paired CT-endoscopy data. Trained exclusively on synthetic data distinct from the benchmark, our model achieves an average translational error of 2.65 mm and a rotational error of 0.19 rad, demonstrating accurate and stable localization. Qualitative results on real patient data further confirm strong cross-domain generalization, achieving consistent frame-wise 2D-3D alignment without domain-specific adaptation. Overall, the proposed framework achieves robust, domain-invariant localization through iterative vision-based optimization, while the new benchmark provides a foundation for standardized progress in vision-based bronchoscopy navigation.

Related papers

Preoperative-to-intraoperative Liver Registration for Laparoscopic Surgery via Latent-Grounded Correspondence Constraints [51.7011449975586]
Land-Reg is a deformable registration framework that learns latent-grounded 2D-3D landmark correspondences.<n>For rigid registration, Land-Reg embraces a Cross-modal Latent Alignment module.<n>An Uncertainty-enhanced Overlap Landmark Detector with similarity matching is proposed to robustly estimate explicit 2D-3D landmark correspondences.
arXiv Detail & Related papers (2026-03-02T10:44:03Z)
Advanced Geometric Correction Algorithms for 3D Medical Reconstruction: Comparison of Computed Tomography and Macroscopic Imaging [0.9395222766576343]
This paper introduces a hybrid two-stage registration framework for reconstructing 3D kidney anatomy from macroscopic slices.<n>It addresses the data-scarcity and high-distortion challenges typical of macroscopic imaging.<n>The proposed framework generalizes to other soft-tissue organs reconstructed from optical or photographic cross-sections.
arXiv Detail & Related papers (2026-01-30T17:16:17Z)
Self-Supervised Contrastive Embedding Adaptation for Endoscopic Image Matching [7.674595072442547]
This research presents a novel Deep Learning pipeline for establishing feature correspondences in endoscopic image pairs.<n>The proposed methodology leverages a novel-view synthesis pipeline to generate ground-truth inlier correspondences.<n>Our pipeline surpasses state-of-the-art methodologies on the SCARED datasets improved matching precision and lower epipolar error.
arXiv Detail & Related papers (2025-12-11T07:44:00Z)
Enhanced Landmark Detection Model in Pelvic Fluoroscopy using 2D/3D Registration Loss [1.6420503030062876]
We propose a novel framework that incorporates 2D/3D landmark registration into the training of a U-Net landmark prediction model.<n>We analyze the performance difference by comparing landmark detection accuracy between the baseline U-Net, U-Net trained with Pose Estimation Loss, and U-Net fine-tuned with Pose Estimation Loss under realistic intra-operative conditions.
arXiv Detail & Related papers (2025-11-26T16:50:06Z)
EqDiff-CT: Equivariant Conditional Diffusion model for CT Image Synthesis from CBCT [43.92108185590778]
Cone-beam computed tomography (CBCT) is widely used for imageguided radiotherapy (IGRT)<n>We propose a novel diffusion-based conditional generative model, coined EqDiff-CT, to synthesize high-quality CT images from CBCT.
arXiv Detail & Related papers (2025-09-26T05:51:59Z)
Accelerating 3D Photoacoustic Computed Tomography with End-to-End Physics-Aware Neural Operators [74.65171736966131]
Photoacoustic computed tomography (PACT) combines optical contrast with ultrasonic resolution, achieving deep-tissue imaging beyond the optical diffusion limit.<n>Current implementations require dense transducer arrays and prolonged acquisition times, limiting clinical translation.<n>We introduce Pano, an end-to-end physics-aware model that directly learns the inverse acoustic mapping from sensor measurements to volumetric reconstructions.
arXiv Detail & Related papers (2025-09-11T23:12:55Z)
Unifying Scale-Aware Depth Prediction and Perceptual Priors for Monocular Endoscope Pose Estimation and Tissue Reconstruction [3.251946340142663]
A unified framework for monocular endoscopic tissue reconstruction is presented.<n>It integrates scale-aware depth prediction with temporally-constrained perceptual refinement.<n> Evaluations on HEVD and SCARED, with ablation and comparative analyses, demonstrate the framework's robustness and superiority over state-of-the-art methods.
arXiv Detail & Related papers (2025-08-15T07:41:17Z)
Robust and Accurate Multi-view 2D/3D Image Registration with Differentiable X-ray Rendering and Dual Cross-view Constraints [45.57808049168089]
We propose a novel multi-view 2D/3D rigid registration approach comprising two stages.<n>In the first stage, a combined loss function is designed, incorporating both the differences between predicted and ground-truth poses.<n>In the second stage, test-time optimization is performed to refine the estimated poses from the coarse stage.
arXiv Detail & Related papers (2025-06-27T12:57:58Z)
MR2US-Pro: Prostate MR to Ultrasound Image Translation and Registration Based on Diffusion Models [7.512221808783586]
We present a novel framework that addresses the challenges through a two-stage process: TRUS 3D reconstruction followed by cross-modal registration.<n>We propose a totally probe-location-independent approach that leverages the natural correlation between sagittal and transverse TRUS views.<n>For the registration stage, we introduce an unsupervised diffusion-based framework guided by modality translation.
arXiv Detail & Related papers (2025-05-31T14:55:03Z)
Harnessing Foundation Models for Robust and Generalizable 6-DOF Bronchoscopy Localization [2.795503750654676]
Vision-based 6-DOF bronchoscopy localization offers a promising solution for accurate and cost-effective interventional guidance.<n>Existing methods struggle with 1) limited generalization across patient cases due to scarce labeled data, and 2) poor robustness under visual degradation.<n>We propose PANSv2, a generalizable and robust bronchoscopy localization framework.
arXiv Detail & Related papers (2025-05-30T06:14:12Z)
DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction [45.00528216648563]
Diffusion Prior Driven Neural Representation (DPER) is an unsupervised framework designed to address the exceptionally ill-posed CT reconstruction inverse problems. DPER adopts the Half Quadratic Splitting (HQS) algorithm to decompose the inverse problem into data fidelity and distribution prior sub-problems. We conduct comprehensive experiments to evaluate the performance of DPER on LACT and ultra-SVCT reconstruction with two public datasets.
arXiv Detail & Related papers (2024-04-27T12:55:13Z)
Fully Differentiable Correlation-driven 2D/3D Registration for X-ray to CT Image Fusion [3.868072865207522]
Image-based rigid 2D/3D registration is a critical technique for fluoroscopic guided surgical interventions. We propose a novel fully differentiable correlation-driven network using a dual-branch CNN-transformer encoder. A correlation-driven loss is proposed for low-frequency feature and high-frequency feature decomposition based on embedded information.
arXiv Detail & Related papers (2024-02-04T14:12:51Z)
Revisiting 3D Context Modeling with Supervised Pre-training for Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices. With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset. The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z)
Tattoo tomography: Freehand 3D photoacoustic image reconstruction with an optical pattern [49.240017254888336]
Photoacoustic tomography (PAT) is a novel imaging technique that can resolve both morphological and functional tissue properties. A current drawback is the limited field-of-view provided by the conventionally applied 2D probes. We present a novel approach to 3D reconstruction of PAT data that does not require an external tracking system.
arXiv Detail & Related papers (2020-11-10T09:27:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.