Local Patches Meet Global Context: Scalable 3D Diffusion Priors for Computed Tomography Reconstruction
- URL: http://arxiv.org/abs/2512.18161v1
- Date: Sat, 20 Dec 2025 00:57:47 GMT
- Title: Local Patches Meet Global Context: Scalable 3D Diffusion Priors for Computed Tomography Reconstruction
- Authors: Taewon Yang, Jason Hu, Jeffrey A. Fessler, Liyue Shen,
- Abstract summary: Diffusion models learn strong image priors that can be leveraged to solve inverse problems like medical image reconstruction.<n>We propose a novel 3D patch-based diffusion model that can learn a fully 3D diffusion prior from limited data.<n>Our approach not only enables high-quality 3D generation, but also offers an unprecedentedly efficient and accurate solution to high-resolution 3D inverse problems.
- Score: 15.46391452264862
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models learn strong image priors that can be leveraged to solve inverse problems like medical image reconstruction. However, for real-world applications such as 3D Computed Tomography (CT) imaging, directly training diffusion models on 3D data presents significant challenges due to the high computational demands of extensive GPU resources and large-scale datasets. Existing works mostly reuse 2D diffusion priors to address 3D inverse problems, but fail to fully realize and leverage the generative capacity of diffusion models for high-dimensional data. In this study, we propose a novel 3D patch-based diffusion model that can learn a fully 3D diffusion prior from limited data, enabling scalable generation of high-resolution 3D images. Our core idea is to learn the prior of 3D patches to achieve scalable efficiency, while coupling local and global information to guarantee high-quality 3D image generation, by modeling the joint distribution of position-aware 3D local patches and downsampled 3D volume as global context. Our approach not only enables high-quality 3D generation, but also offers an unprecedentedly efficient and accurate solution to high-resolution 3D inverse problems. Experiments on 3D CT reconstruction across multiple datasets show that our method outperforms state-of-the-art methods in both performance and efficiency, notably achieving high-resolution 3D reconstruction of $512 \times 512 \times 256$ ($\sim$20 mins).
Related papers
- DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation [33.62074896816882]
DiffSplat is a novel 3D generative framework that generates 3D Gaussian splats by taming large-scale text-to-image diffusion models.<n>It differs from previous 3D generative models by effectively utilizing web-scale 2D priors while maintaining 3D consistency in a unified model.<n>In conjunction with the regular diffusion loss on these grids, a 3D rendering loss is introduced to facilitate 3D coherence across arbitrary views.
arXiv Detail & Related papers (2025-01-28T07:38:59Z) - Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation [66.75243908044538]
We introduce Zero-1-to-G, a novel approach to direct 3D generation on Gaussian splats using pretrained 2D diffusion models.<n>To incorporate 3D awareness, we introduce cross-view and cross-attribute attention layers, which capture complex correlations and enforce 3D consistency across generated splats.<n>This makes Zero-1-to-G the first direct image-to-3D generative model to effectively utilize pretrained 2D diffusion priors, enabling efficient training and improved generalization to unseen objects.
arXiv Detail & Related papers (2025-01-09T18:37:35Z) - DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets.<n>Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z) - Diffusion Models in 3D Vision: A Survey [18.805222552728225]
3D vision has become a crucial field within computer vision, powering a range of applications such as autonomous driving, robotics, augmented reality, and medical imaging.<n>We review the state-of-the-art methods that use diffusion models for 3D visual tasks, including but not limited to 3D object generation, shape completion, point-cloud reconstruction, and scene construction.<n>We discuss potential solutions, including improving computational efficiency, enhancing multimodal fusion, and exploring the use of large-scale pretraining for better generalization across 3D tasks.
arXiv Detail & Related papers (2024-10-07T04:12:23Z) - DiffusionBlend: Learning 3D Image Prior through Position-aware Diffusion Score Blending for 3D Computed Tomography Reconstruction [12.04892150473192]
We propose a novel framework that enables learning the 3D image prior through position-aware 3D-patch diffusion score blending.
Our algorithm also comes with better or comparable computational efficiency than previous state-of-the-art methods.
arXiv Detail & Related papers (2024-06-14T17:47:50Z) - LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation [73.36690511083894]
This paper introduces a novel framework called LN3Diff to address a unified 3D diffusion pipeline.
Our approach harnesses a 3D-aware architecture and variational autoencoder to encode the input image into a structured, compact, and 3D latent space.
It achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation.
arXiv Detail & Related papers (2024-03-18T17:54:34Z) - Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D
priors [16.93758384693786]
Bidirectional Diffusion(BiDiff) is a unified framework that incorporates both a 3D and a 2D diffusion process.
Our model achieves high-quality, diverse, and scalable 3D generation.
arXiv Detail & Related papers (2023-12-07T10:00:04Z) - Wonder3D: Single Image to 3D using Cross-Domain Diffusion [105.16622018766236]
Wonder3D is a novel method for efficiently generating high-fidelity textured meshes from single-view images.
To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model.
arXiv Detail & Related papers (2023-10-23T15:02:23Z) - Improving 3D Imaging with Pre-Trained Perpendicular 2D Diffusion Models [52.529394863331326]
We propose a novel approach using two perpendicular pre-trained 2D diffusion models to solve the 3D inverse problem.
Our method is highly effective for 3D medical image reconstruction tasks, including MRI Z-axis super-resolution, compressed sensing MRI, and sparse-view CT.
arXiv Detail & Related papers (2023-03-15T08:28:06Z) - DreamFusion: Text-to-3D using 2D Diffusion [52.52529213936283]
Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs.
In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis.
Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.
arXiv Detail & Related papers (2022-09-29T17:50:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.