Latent Space Consistency for Sparse-View CT Reconstruction
- URL: http://arxiv.org/abs/2507.11152v1
- Date: Tue, 15 Jul 2025 10:02:19 GMT
- Title: Latent Space Consistency for Sparse-View CT Reconstruction
- Authors: Duoyou Chen, Yunqing Chen, Can Zhang, Zhou Wang, Cheng Chen, Ruoxiu Xiao,
- Abstract summary: Latent Diffusion Model (LDM) has demonstrated promising potential in the domain of 3D CT reconstruction.<n>Cross-modal feature contrastive learning is used to efficiently extract latent 3D information from 2D Xray images.<n>Results indicate that CLS-DM outperforms classical and state-of-the-art generative models in terms of standard voxel-level metrics.
- Score: 10.057432803124167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computed Tomography (CT) is a widely utilized imaging modality in clinical settings. Using densely acquired rotational X-ray arrays, CT can capture 3D spatial features. However, it is confronted with challenged such as significant time consumption and high radiation exposure. CT reconstruction methods based on sparse-view X-ray images have garnered substantial attention from researchers as they present a means to mitigate costs and risks. In recent years, diffusion models, particularly the Latent Diffusion Model (LDM), have demonstrated promising potential in the domain of 3D CT reconstruction. Nonetheless, due to the substantial differences between the 2D latent representation of X-ray modalities and the 3D latent representation of CT modalities, the vanilla LDM is incapable of achieving effective alignment within the latent space. To address this issue, we propose the Consistent Latent Space Diffusion Model (CLS-DM), which incorporates cross-modal feature contrastive learning to efficiently extract latent 3D information from 2D X-ray images and achieve latent space alignment between modalities. Experimental results indicate that CLS-DM outperforms classical and state-of-the-art generative models in terms of standard voxel-level metrics (PSNR, SSIM) on the LIDC-IDRI and CTSpine1K datasets. This methodology not only aids in enhancing the effectiveness and economic viability of sparse X-ray reconstructed CT but can also be generalized to other cross-modal transformation tasks, such as text-to-image synthesis. We have made our code publicly available at https://anonymous.4open.science/r/CLS-DM-50D6/ to facilitate further research and applications in other domains.
Related papers
- 3D Wavelet Latent Diffusion Model for Whole-Body MR-to-CT Modality Translation [13.252652406393205]
Existing MR-to-CT methods for whole-body imaging often suffer from poor spatial alignment between the generated CT and input MR images.<n>We present a novel 3D Wavelet Latent Diffusion Model (3D-WLDM) that addresses these limitations.<n>By incorporating a Wavelet Residual Module into the encoder-decoder architecture, we enhance the capture and reconstruction of fine-scale features across image and latent spaces.
arXiv Detail & Related papers (2025-07-14T06:17:05Z) - TRACE: Temporally Reliable Anatomically-Conditioned 3D CT Generation with Enhanced Efficiency [40.82927972746919]
TRACE is a framework that generates 3D medical images with temporal alignment.<n>An overlapping-frame frame pairs pairs into a flexible length sequence, reconstructed into atemporally and anatomically aligned 3D volume.
arXiv Detail & Related papers (2025-07-01T14:35:39Z) - 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans.
Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z) - DIFR3CT: Latent Diffusion for Probabilistic 3D CT Reconstruction from Few Planar X-Rays [15.83915012691264]
DIFR3CT is a 3D latent diffusion model that generates plausible CT volumes from one or few planar x-ray observations.
We conduct experiments demonstrating that DIFR3CT is better than recent sparse CT reconstruction baselines in terms of standard pixel-level.
We also show that DIFR3CT supports uncertainty quantification via Monte Carlo sampling, which provides an opportunity to measure reconstruction reliability.
arXiv Detail & Related papers (2024-08-27T14:58:08Z) - DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays [41.393567374399524]
We propose DiffuX2CT, which models CT reconstruction from ultra-sparse X-rays as a conditional diffusion process.
By doing so, DiffuX2CT achieves structure-controllable reconstruction, which enables 3D structural information to be recovered from 2D X-rays.
As an extra contribution, we collect a real-world lumbar CT dataset, called LumbarV, as a new benchmark to verify the clinical significance and performance of CT reconstruction from X-rays.
arXiv Detail & Related papers (2024-07-18T14:20:04Z) - CoCPF: Coordinate-based Continuous Projection Field for Ill-Posed Inverse Problem in Imaging [78.734927709231]
Sparse-view computed tomography (SVCT) reconstruction aims to acquire CT images based on sparsely-sampled measurements.
Due to ill-posedness, implicit neural representation (INR) techniques may leave considerable holes'' (i.e., unmodeled spaces) in their fields, leading to sub-optimal results.
We propose the Coordinate-based Continuous Projection Field (CoCPF), which aims to build hole-free representation fields for SVCT reconstruction.
arXiv Detail & Related papers (2024-06-21T08:38:30Z) - C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction [17.54830070112685]
Cone beam computed tomography (CBCT) is an important imaging technology widely used in medical scenarios.
CBCT reconstruction is more challenging due to the increased dimensionality caused by the measurement process based on cone-shaped X-ray beams.
We propose C2RV by leveraging explicit multi-scale volumetric representations to enable cross-regional learning in the 3D space.
arXiv Detail & Related papers (2024-06-06T09:37:56Z) - SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion
Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification.
The proposed framework has been validated through comprehensive experiments on two clinical datasets.
To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z) - On the Localization of Ultrasound Image Slices within Point Distribution
Models [84.27083443424408]
Thyroid disorders are most commonly diagnosed using high-resolution Ultrasound (US)
Longitudinal tracking is a pivotal diagnostic protocol for monitoring changes in pathological thyroid morphology.
We present a framework for automated US image slice localization within a 3D shape representation.
arXiv Detail & Related papers (2023-09-01T10:10:46Z) - MedNeRF: Medical Neural Radiance Fields for Reconstructing 3D-aware
CT-Projections from a Single X-ray [14.10611608681131]
Excessive ionising radiation can lead to deterministic and harmful effects on the body.
This paper proposes a Deep Learning model that learns to reconstruct CT projections from a few or even a single-view X-ray.
arXiv Detail & Related papers (2022-02-02T13:25:23Z) - Automated Model Design and Benchmarking of 3D Deep Learning Models for
COVID-19 Detection with Chest CT Scans [72.04652116817238]
We propose a differentiable neural architecture search (DNAS) framework to automatically search for the 3D DL models for 3D chest CT scans classification.
We also exploit the Class Activation Mapping (CAM) technique on our models to provide the interpretability of the results.
arXiv Detail & Related papers (2021-01-14T03:45:01Z) - Revisiting 3D Context Modeling with Supervised Pre-training for
Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices.
With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset.
The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.