X-LRM: X-ray Large Reconstruction Model for Extremely Sparse-View Computed Tomography Recovery in One Second
- URL: http://arxiv.org/abs/2503.06382v1
- Date: Sun, 09 Mar 2025 01:39:59 GMT
- Title: X-LRM: X-ray Large Reconstruction Model for Extremely Sparse-View Computed Tomography Recovery in One Second
- Authors: Guofeng Zhang, Ruyi Zha, Hao He, Yixun Liang, Alan Yuille, Hongdong Li, Yuanhao Cai,
- Abstract summary: Sparse-view 3D CT reconstruction aims to recover structures from a limited number of 2D X-ray projections.<n>Existing feedforward methods are constrained by the limited capacity of CNN-based architectures and the scarcity of large-scale training datasets.<n>We propose an X-ray Large Reconstruction Model (X-LRM) for extremely sparse-view (10 views) CT reconstruction.
- Score: 52.11676689269379
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse-view 3D CT reconstruction aims to recover volumetric structures from a limited number of 2D X-ray projections. Existing feedforward methods are constrained by the limited capacity of CNN-based architectures and the scarcity of large-scale training datasets. In this paper, we propose an X-ray Large Reconstruction Model (X-LRM) for extremely sparse-view (<10 views) CT reconstruction. X-LRM consists of two key components: X-former and X-triplane. Our X-former can handle an arbitrary number of input views using an MLP-based image tokenizer and a Transformer-based encoder. The output tokens are then upsampled into our X-triplane representation, which models the 3D radiodensity as an implicit neural field. To support the training of X-LRM, we introduce Torso-16K, a large-scale dataset comprising over 16K volume-projection pairs of various torso organs. Extensive experiments demonstrate that X-LRM outperforms the state-of-the-art method by 1.5 dB and achieves 27x faster speed and better flexibility. Furthermore, the downstream evaluation of lung segmentation tasks also suggests the practical value of our approach. Our code, pre-trained models, and dataset will be released at https://github.com/caiyuanhao1998/X-LRM
Related papers
- Fan-Beam CT Reconstruction for Unaligned Sparse-View X-ray Baggage Dataset [0.0]
We present a calibration and reconstruction method using an unaligned sparse multi-view X-ray baggage dataset.<n>Our approach integrates multi-spectral neural attenuation field reconstruction with Linear pushbroom (LPB) camera model pose optimization.
arXiv Detail & Related papers (2024-12-04T05:16:54Z) - Differentiable Voxel-based X-ray Rendering Improves Sparse-View 3D CBCT Reconstruction [4.941613865666241]
We present DiffVox, a self-supervised framework for Cone-Beam Computed Tomography (CBCT) reconstruction.<n>As a result, we reconstruct high-fidelity 3D CBCT volumes from fewer X-rays, potentially reducing ionizing radiation exposure and improving diagnostic utility.
arXiv Detail & Related papers (2024-11-28T15:49:08Z) - SCube: Instant Large-Scale Scene Reconstruction using VoxSplats [55.383993296042526]
We present SCube, a novel method for reconstructing large-scale 3D scenes (geometry, appearance, and semantics) from a sparse set of posed images.
Our method encodes reconstructed scenes using a novel representation VoxSplat, which is a set of 3D Gaussians supported on a high-resolution sparse-voxel scaffold.
arXiv Detail & Related papers (2024-10-26T00:52:46Z) - R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction [53.19869886963333]
3D Gaussian splatting (3DGS) has shown promising results in rendering image and surface reconstruction.
This paper introduces R2$-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction.
arXiv Detail & Related papers (2024-05-31T08:39:02Z) - Pre-training on High Definition X-ray Images: An Experimental Study [19.46094537296955]
We propose the first high-definition (1280 $times$ 1280) X-ray based pre-trained foundation vision model on a large-scale dataset.
Our model follows the masked auto-encoder framework which takes the tokens after mask processing (with a high rate) is used as input.
We validate the effectiveness of our model on two downstream tasks, including X-ray report generation and disease recognition.
arXiv Detail & Related papers (2024-04-27T14:29:53Z) - Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis [88.86777314004044]
We propose a 3D Gaussian splatting-based framework, namely X-Gaussian, for X-ray novel view visualization.
Experiments show that our X-Gaussian outperforms state-of-the-art methods by 6.5 dB while enjoying less than 15% training time and over 73x inference speed.
arXiv Detail & Related papers (2024-03-07T00:12:08Z) - Structure-Aware Sparse-View X-ray 3D Reconstruction [26.91084106735878]
We propose a framework, Structure-Aware X-ray Neural Radiodensity Fields (SAX-NeRF) for sparse-view X-ray 3D reconstruction.
Linefomer captures internal structures of objects in 3D space by modeling the dependencies within each line segment of an X-ray.
Experiments on X3D show that SAX-NeRF surpasses previous NeRF-based methods by 12.56 and 2.49 dB on novel view synthesis and CT reconstruction.
arXiv Detail & Related papers (2023-11-18T03:39:02Z) - LRM: Large Reconstruction Model for Single Image to 3D [61.47357798633123]
We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds.
LRM adopts a highly scalable transformer-based architecture with 500 million learnable parameters to directly predict a neural radiance field (NeRF) from the input image.
We train our model in an end-to-end manner on massive multi-view data containing around 1 million objects.
arXiv Detail & Related papers (2023-11-08T00:03:52Z) - Geometry-Aware Attenuation Learning for Sparse-View CBCT Reconstruction [53.93674177236367]
Cone Beam Computed Tomography (CBCT) plays a vital role in clinical imaging.
Traditional methods typically require hundreds of 2D X-ray projections to reconstruct a high-quality 3D CBCT image.
This has led to a growing interest in sparse-view CBCT reconstruction to reduce radiation doses.
We introduce a novel geometry-aware encoder-decoder framework to solve this problem.
arXiv Detail & Related papers (2023-03-26T14:38:42Z) - Self-Supervised 2D/3D Registration for X-Ray to CT Image Fusion [10.040271638205382]
We propose a self-supervised 2D/3D registration framework combining simulated training with unsupervised feature and pixel space domain adaptation.
Our framework achieves a registration accuracy of 1.83$pm$1.16 mm with a high success ratio of 90.1% on real X-ray images.
arXiv Detail & Related papers (2022-10-14T08:06:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.