BronchOpt : Vision-Based Pose Optimization with Fine-Tuned Foundation Models for Accurate Bronchoscopy Navigation
- URL: http://arxiv.org/abs/2511.09443v1
- Date: Thu, 13 Nov 2025 01:54:53 GMT
- Title: BronchOpt : Vision-Based Pose Optimization with Fine-Tuned Foundation Models for Accurate Bronchoscopy Navigation
- Authors: Hongchao Shu, Roger D. Soberanis-Mukul, Jiru Xu, Hao Ding, Morgan Ringel, Mali Shen, Saif Iftekar Sayed, Hedyeh Rafii-Tari, Mathias Unberath,
- Abstract summary: We propose a vision-based pose optimization framework for 2D-3D registration between intra-operative endoscopic views and pre-operative CT anatomy.<n>A fine-tuned modality- and domain-invariant encoder enables direct similarity between real endoscopic RGB frames and CT-rendered depth maps.<n>Our model achieves an average translational error of 2.65 mm and a rotational error of 0.19 rad, demonstrating accurate and stable localization.
- Score: 6.915058920280426
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate intra-operative localization of the bronchoscope tip relative to patient anatomy remains challenging due to respiratory motion, anatomical variability, and CT-to-body divergence that cause deformation and misalignment between intra-operative views and pre-operative CT. Existing vision-based methods often fail to generalize across domains and patients, leading to residual alignment errors. This work establishes a generalizable foundation for bronchoscopy navigation through a robust vision-based framework and a new synthetic benchmark dataset that enables standardized and reproducible evaluation. We propose a vision-based pose optimization framework for frame-wise 2D-3D registration between intra-operative endoscopic views and pre-operative CT anatomy. A fine-tuned modality- and domain-invariant encoder enables direct similarity computation between real endoscopic RGB frames and CT-rendered depth maps, while a differentiable rendering module iteratively refines camera poses through depth consistency. To enhance reproducibility, we introduce the first public synthetic benchmark dataset for bronchoscopy navigation, addressing the lack of paired CT-endoscopy data. Trained exclusively on synthetic data distinct from the benchmark, our model achieves an average translational error of 2.65 mm and a rotational error of 0.19 rad, demonstrating accurate and stable localization. Qualitative results on real patient data further confirm strong cross-domain generalization, achieving consistent frame-wise 2D-3D alignment without domain-specific adaptation. Overall, the proposed framework achieves robust, domain-invariant localization through iterative vision-based optimization, while the new benchmark provides a foundation for standardized progress in vision-based bronchoscopy navigation.
Related papers
- Preoperative-to-intraoperative Liver Registration for Laparoscopic Surgery via Latent-Grounded Correspondence Constraints [51.7011449975586]
Land-Reg is a deformable registration framework that learns latent-grounded 2D-3D landmark correspondences.<n>For rigid registration, Land-Reg embraces a Cross-modal Latent Alignment module.<n>An Uncertainty-enhanced Overlap Landmark Detector with similarity matching is proposed to robustly estimate explicit 2D-3D landmark correspondences.
arXiv Detail & Related papers (2026-03-02T10:44:03Z) - Advanced Geometric Correction Algorithms for 3D Medical Reconstruction: Comparison of Computed Tomography and Macroscopic Imaging [0.9395222766576343]
This paper introduces a hybrid two-stage registration framework for reconstructing 3D kidney anatomy from macroscopic slices.<n>It addresses the data-scarcity and high-distortion challenges typical of macroscopic imaging.<n>The proposed framework generalizes to other soft-tissue organs reconstructed from optical or photographic cross-sections.
arXiv Detail & Related papers (2026-01-30T17:16:17Z) - Self-Supervised Contrastive Embedding Adaptation for Endoscopic Image Matching [7.674595072442547]
This research presents a novel Deep Learning pipeline for establishing feature correspondences in endoscopic image pairs.<n>The proposed methodology leverages a novel-view synthesis pipeline to generate ground-truth inlier correspondences.<n>Our pipeline surpasses state-of-the-art methodologies on the SCARED datasets improved matching precision and lower epipolar error.
arXiv Detail & Related papers (2025-12-11T07:44:00Z) - Enhanced Landmark Detection Model in Pelvic Fluoroscopy using 2D/3D Registration Loss [1.6420503030062876]
We propose a novel framework that incorporates 2D/3D landmark registration into the training of a U-Net landmark prediction model.<n>We analyze the performance difference by comparing landmark detection accuracy between the baseline U-Net, U-Net trained with Pose Estimation Loss, and U-Net fine-tuned with Pose Estimation Loss under realistic intra-operative conditions.
arXiv Detail & Related papers (2025-11-26T16:50:06Z) - EqDiff-CT: Equivariant Conditional Diffusion model for CT Image Synthesis from CBCT [43.92108185590778]
Cone-beam computed tomography (CBCT) is widely used for imageguided radiotherapy (IGRT)<n>We propose a novel diffusion-based conditional generative model, coined EqDiff-CT, to synthesize high-quality CT images from CBCT.
arXiv Detail & Related papers (2025-09-26T05:51:59Z) - Accelerating 3D Photoacoustic Computed Tomography with End-to-End Physics-Aware Neural Operators [74.65171736966131]
Photoacoustic computed tomography (PACT) combines optical contrast with ultrasonic resolution, achieving deep-tissue imaging beyond the optical diffusion limit.<n>Current implementations require dense transducer arrays and prolonged acquisition times, limiting clinical translation.<n>We introduce Pano, an end-to-end physics-aware model that directly learns the inverse acoustic mapping from sensor measurements to volumetric reconstructions.
arXiv Detail & Related papers (2025-09-11T23:12:55Z) - Unifying Scale-Aware Depth Prediction and Perceptual Priors for Monocular Endoscope Pose Estimation and Tissue Reconstruction [3.251946340142663]
A unified framework for monocular endoscopic tissue reconstruction is presented.<n>It integrates scale-aware depth prediction with temporally-constrained perceptual refinement.<n> Evaluations on HEVD and SCARED, with ablation and comparative analyses, demonstrate the framework's robustness and superiority over state-of-the-art methods.
arXiv Detail & Related papers (2025-08-15T07:41:17Z) - Robust and Accurate Multi-view 2D/3D Image Registration with Differentiable X-ray Rendering and Dual Cross-view Constraints [45.57808049168089]
We propose a novel multi-view 2D/3D rigid registration approach comprising two stages.<n>In the first stage, a combined loss function is designed, incorporating both the differences between predicted and ground-truth poses.<n>In the second stage, test-time optimization is performed to refine the estimated poses from the coarse stage.
arXiv Detail & Related papers (2025-06-27T12:57:58Z) - MR2US-Pro: Prostate MR to Ultrasound Image Translation and Registration Based on Diffusion Models [7.512221808783586]
We present a novel framework that addresses the challenges through a two-stage process: TRUS 3D reconstruction followed by cross-modal registration.<n>We propose a totally probe-location-independent approach that leverages the natural correlation between sagittal and transverse TRUS views.<n>For the registration stage, we introduce an unsupervised diffusion-based framework guided by modality translation.
arXiv Detail & Related papers (2025-05-31T14:55:03Z) - Harnessing Foundation Models for Robust and Generalizable 6-DOF Bronchoscopy Localization [2.795503750654676]
Vision-based 6-DOF bronchoscopy localization offers a promising solution for accurate and cost-effective interventional guidance.<n>Existing methods struggle with 1) limited generalization across patient cases due to scarce labeled data, and 2) poor robustness under visual degradation.<n>We propose PANSv2, a generalizable and robust bronchoscopy localization framework.
arXiv Detail & Related papers (2025-05-30T06:14:12Z) - DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction [45.00528216648563]
Diffusion Prior Driven Neural Representation (DPER) is an unsupervised framework designed to address the exceptionally ill-posed CT reconstruction inverse problems.
DPER adopts the Half Quadratic Splitting (HQS) algorithm to decompose the inverse problem into data fidelity and distribution prior sub-problems.
We conduct comprehensive experiments to evaluate the performance of DPER on LACT and ultra-SVCT reconstruction with two public datasets.
arXiv Detail & Related papers (2024-04-27T12:55:13Z) - Fully Differentiable Correlation-driven 2D/3D Registration for X-ray to CT Image Fusion [3.868072865207522]
Image-based rigid 2D/3D registration is a critical technique for fluoroscopic guided surgical interventions.
We propose a novel fully differentiable correlation-driven network using a dual-branch CNN-transformer encoder.
A correlation-driven loss is proposed for low-frequency feature and high-frequency feature decomposition based on embedded information.
arXiv Detail & Related papers (2024-02-04T14:12:51Z) - Revisiting 3D Context Modeling with Supervised Pre-training for
Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices.
With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset.
The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z) - Tattoo tomography: Freehand 3D photoacoustic image reconstruction with
an optical pattern [49.240017254888336]
Photoacoustic tomography (PAT) is a novel imaging technique that can resolve both morphological and functional tissue properties.
A current drawback is the limited field-of-view provided by the conventionally applied 2D probes.
We present a novel approach to 3D reconstruction of PAT data that does not require an external tracking system.
arXiv Detail & Related papers (2020-11-10T09:27:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.