Related papers: Harnessing Foundation Models for Robust and Generalizable 6-DOF Bronchoscopy Localization

Harnessing Foundation Models for Robust and Generalizable 6-DOF Bronchoscopy Localization

URL: http://arxiv.org/abs/2505.24249v1
Date: Fri, 30 May 2025 06:14:12 GMT
Title: Harnessing Foundation Models for Robust and Generalizable 6-DOF Bronchoscopy Localization
Authors: Qingyao Tian, Huai Liao, Xinyan Huang, Bingyu Yang, Hongbin Liu,
Abstract summary: Vision-based 6-DOF bronchoscopy localization offers a promising solution for accurate and cost-effective interventional guidance.<n>Existing methods struggle with 1) limited generalization across patient cases due to scarce labeled data, and 2) poor robustness under visual degradation.<n>We propose PANSv2, a generalizable and robust bronchoscopy localization framework.
Score: 2.795503750654676
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-based 6-DOF bronchoscopy localization offers a promising solution for accurate and cost-effective interventional guidance. However, existing methods struggle with 1) limited generalization across patient cases due to scarce labeled data, and 2) poor robustness under visual degradation, as bronchoscopy procedures frequently involve artifacts such as occlusions and motion blur that impair visual information. To address these challenges, we propose PANSv2, a generalizable and robust bronchoscopy localization framework. Motivated by PANS that leverages multiple visual cues for pose likelihood measurement, PANSv2 integrates depth estimation, landmark detection, and centerline constraints into a unified pose optimization framework that evaluates pose probability and solves for the optimal bronchoscope pose. To further enhance generalization capabilities, we leverage the endoscopic foundation model EndoOmni for depth estimation and the video foundation model EndoMamba for landmark detection, incorporating both spatial and temporal analyses. Pretrained on diverse endoscopic datasets, these models provide stable and transferable visual representations, enabling reliable performance across varied bronchoscopy scenarios. Additionally, to improve robustness to visual degradation, we introduce an automatic re-initialization module that detects tracking failures and re-establishes pose using landmark detections once clear views are available. Experimental results on bronchoscopy dataset encompassing 10 patient cases show that PANSv2 achieves the highest tracking success rate, with an 18.1% improvement in SR-5 (percentage of absolute trajectory error under 5 mm) compared to existing methods, showing potential towards real clinical usage.

Related papers

Lightweight Relational Embedding in Task-Interpolated Few-Shot Networks for Enhanced Gastrointestinal Disease Classification [0.0]
Colon cancer detection is crucial for increasing patient survival rates.<n> colonoscopy is dependent on obtaining adequate and high-quality endoscopic images.<n>Few-Shot Learning architecture enables our model to rapidly adapt to unseen fine-grained endoscopic image patterns.<n>Our model demonstrated superior performance, achieving an accuracy of 90.1%, precision of 0.845, recall of 0.942, and an F1 score of 0.891.
arXiv Detail & Related papers (2025-05-30T16:54:51Z)
DiffDoctor: Diagnosing Image Diffusion Models Before Treating [57.82359018425674]
We propose DiffDoctor, a two-stage pipeline to assist image diffusion models in generating fewer artifacts.<n>We collect a dataset of over 1M flawed synthesized images and set up an efficient human-in-the-loop annotation process.<n>The learned artifact detector is then involved in the second stage to optimize the diffusion model by providing pixel-level feedback.
arXiv Detail & Related papers (2025-01-21T18:56:41Z)
Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Generative Latent Priors [10.61978045582697]
3D mapping in endoscopy enables quantitative, holistic lesion characterization within the gastrointestinal (GI) tract.<n>Existing methods relying on synthetic datasets or complex models often lack generalizability in challenging endoscopic conditions.<n>We propose a robust self-supervised monocular depth and pose estimation framework that incorporates a Generative Latent Bank and a Variational Autoencoder.
arXiv Detail & Related papers (2024-11-26T15:43:06Z)
PANS: Probabilistic Airway Navigation System for Real-time Robust Bronchoscope Localization [4.755280006199144]
We propose a novel Probabilistic Airway Navigation System (PANS) for bronchoscope localization. Our PANS incorporates diverse visual representations by leveraging two key modules, including the Depth-based Motion Inference (DMI) and the Bronchial Semantic Analysis (BSA) Under this probabilistic formulation, our PANS is capable of achieving the 6-DOF bronchoscope localization with superior accuracy and robustness.
arXiv Detail & Related papers (2024-07-08T02:13:41Z)
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery [71.6345505427213]
DPMesh is an innovative framework for occluded human mesh recovery. It capitalizes on the profound diffusion prior about object structure and spatial relationships embedded in a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-04-01T18:59:13Z)
ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic Diffusion Models [69.9178140563928]
Colonoscopy analysis is essential for assisting clinical diagnosis and treatment. The scarcity of annotated data limits the effectiveness and generalization of existing methods. We propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks.
arXiv Detail & Related papers (2023-09-03T07:55:46Z)
Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations. In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z)
Deep denoising autoencoder-based non-invasive blood flow detection for arteriovenous fistula [10.030431512848239]
We propose an approach based on deep denoising autoencoders (DAEs) that perform dimensionality reduction and reconstruction tasks. Our results demonstrate that the latent representation generated by the DAE surpasses expectations with an accuracy of 0.93. The incorporation of noise-mixing and the utilization of a noise-to-clean scheme effectively enhance the discriminative capabilities of the latent representation.
arXiv Detail & Related papers (2023-06-12T04:46:01Z)
The role of noise in denoising models for anomaly detection in medical images [62.0532151156057]
Pathological brain lesions exhibit diverse appearance in brain images. Unsupervised anomaly detection approaches have been proposed using only normal data for training. We show that optimization of the spatial resolution and magnitude of the noise improves the performance of different model training regimes.
arXiv Detail & Related papers (2023-01-19T21:39:38Z)
Learnable Patchmatch and Self-Teaching for Multi-Frame Depth Estimation in Monocular Endoscopy [16.233423010425355]
We propose a novel unsupervised multi-frame monocular depth estimation model.<n>The proposed model integrates a learnable patchmatch module to adaptively increase the discriminative ability in regions with low and homogeneous textures.<n>As a byproduct of the self-teaching paradigm, the proposed model is able to improve the depth predictions when more frames are input at test time.
arXiv Detail & Related papers (2022-05-30T12:11:03Z)
Improving Classification Model Performance on Chest X-Rays through Lung Segmentation [63.45024974079371]
We propose a deep learning approach to enhance abnormal chest x-ray (CXR) identification performance through segmentations. Our approach is designed in a cascaded manner and incorporates two modules: a deep neural network with criss-cross attention modules (XLSor) for localizing lung region in CXR images and a CXR classification model with a backbone of a self-supervised momentum contrast (MoCo) model pre-trained on large-scale CXR data sets.
arXiv Detail & Related papers (2022-02-22T15:24:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.