Xray2Xray: World Model from Chest X-rays with Volumetric Context
- URL: http://arxiv.org/abs/2506.19055v1
- Date: Tue, 17 Jun 2025 20:17:07 GMT
- Title: Xray2Xray: World Model from Chest X-rays with Volumetric Context
- Authors: Zefan Yang, Xinrui Song, Xuanang Xu, Yongyi Shi, Ge Wang, Mannudeep K. Kalra, Pingkun Yan,
- Abstract summary: This study introduces Xray2Xray, a novel World Model that learns latent representations encoding 3D structural information from chest X-rays.<n>Xray2Xray captures the latent representations of the chest volume by modeling the transition dynamics of X-ray projections.
- Score: 12.453185395782054
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Chest X-rays (CXRs) are the most widely used medical imaging modality and play a pivotal role in diagnosing diseases. However, as 2D projection images, CXRs are limited by structural superposition, which constrains their effectiveness in precise disease diagnosis and risk prediction. To address the limitations of 2D CXRs, this study introduces Xray2Xray, a novel World Model that learns latent representations encoding 3D structural information from chest X-rays. Xray2Xray captures the latent representations of the chest volume by modeling the transition dynamics of X-ray projections across different angular positions with a vision model and a transition model. We employed the latent representations of Xray2Xray for downstream risk prediction and disease diagnosis tasks. Experimental results showed that Xray2Xray outperformed both supervised methods and self-supervised pretraining methods for cardiovascular disease risk estimation and achieved competitive performance in classifying five pathologies in CXRs. We also assessed the quality of Xray2Xray's latent representations through synthesis tasks and demonstrated that the latent representations can be used to reconstruct volumetric context.
Related papers
- RadFabric: Agentic AI System with Reasoning Capability for Radiology [61.25593938175618]
RadFabric is a multi agent, multimodal reasoning framework that unifies visual and textual analysis for comprehensive CXR interpretation.<n>System employs specialized CXR agents for pathology detection, an Anatomical Interpretation Agent to map visual findings to precise anatomical structures, and a Reasoning Agent powered by large multimodal reasoning models to synthesize visual, anatomical, and clinical data into transparent and evidence based diagnoses.
arXiv Detail & Related papers (2025-06-17T03:10:33Z) - X-GRM: Large Gaussian Reconstruction Model for Sparse-view X-rays to Computed Tomography [89.84588038174721]
Computed Tomography serves as an indispensable tool in clinical, providing non-invasive visualization of internal anatomical structures.<n>Existing CT reconstruction works are limited to small-capacity model architecture and inflexible volume representation.<n>We present X-GRM, a large feedforward model for reconstructing 3D CT volumes from sparse-view 2D X-ray projections.
arXiv Detail & Related papers (2025-05-21T08:14:10Z) - Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation [21.772106685777995]
We introduce a radiology-focused visual language model designed to generate radiology reports from chest X-rays.<n>Our model combines an image encoder with a fine-tuned LLM based on the Vicuna-7B architecture, enabling it to generate different sections of a radiology report with notable accuracy.
arXiv Detail & Related papers (2024-12-06T11:14:03Z) - DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays [41.393567374399524]
We propose DiffuX2CT, which models CT reconstruction from ultra-sparse X-rays as a conditional diffusion process.
By doing so, DiffuX2CT achieves structure-controllable reconstruction, which enables 3D structural information to be recovered from 2D X-rays.
As an extra contribution, we collect a real-world lumbar CT dataset, called LumbarV, as a new benchmark to verify the clinical significance and performance of CT reconstruction from X-rays.
arXiv Detail & Related papers (2024-07-18T14:20:04Z) - UMedNeRF: Uncertainty-aware Single View Volumetric Rendering for Medical
Neural Radiance Fields [38.62191342903111]
We propose an Uncertainty-aware MedNeRF (UMedNeRF) network based on generated radiation fields.
We show the results of CT projection rendering with a single X-ray and compare our method with other methods based on generated radiation fields.
arXiv Detail & Related papers (2023-11-10T02:47:15Z) - Vision-Language Generative Model for View-Specific Chest X-ray Generation [18.347723213970696]
ViewXGen is designed to overcome the limitations of existing methods to generate frontal-view chest X-rays.
Our approach takes into consideration the diverse view positions found in the dataset, enabling the generation of chest X-rays with specific views.
arXiv Detail & Related papers (2023-02-23T17:13:25Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Improving Computed Tomography (CT) Reconstruction via 3D Shape Induction [3.1498833540989413]
We propose shape induction, that is, learning the shape of 3D CT from X-ray without CT supervision, as a novel technique to incorporate realistic X-ray distributions during training of a reconstruction model.
Our experiments demonstrate that this process improves both the perceptual quality of generated CT and the accuracy of down-stream classification of pulmonary infectious diseases.
arXiv Detail & Related papers (2022-08-23T13:06:02Z) - Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning.
We generate a corresponding radiology image in a target domain while preserving the identity of the patient.
We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z) - Contrastive Attention for Automatic Chest X-ray Report Generation [124.60087367316531]
In most cases, the normal regions dominate the entire chest X-ray image, and the corresponding descriptions of these normal regions dominate the final report.
We propose Contrastive Attention (CA) model, which compares the current input image with normal images to distill the contrastive information.
We achieve the state-of-the-art results on the two public datasets.
arXiv Detail & Related papers (2021-06-13T11:20:31Z) - Variational Knowledge Distillation for Disease Classification in Chest
X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays.
We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z) - CheXternal: Generalization of Deep Learning Models for Chest X-ray
Interpretation to Photos of Chest X-rays and External Clinical Settings [6.133159722996137]
We measured the diagnostic performance for 8 different chest X-ray models when applied to smartphone photos of chest X-rays and external datasets.
We found that on photos of chest X-rays, all 8 models experienced a statistically significant drop in task performance.
Some chest X-ray models, under clinically relevant distribution shifts, were comparable to radiologists while other models were not.
arXiv Detail & Related papers (2021-02-17T09:58:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.