Latent Interpolation Learning Using Diffusion Models for Cardiac Volume Reconstruction
- URL: http://arxiv.org/abs/2508.13826v3
- Date: Thu, 21 Aug 2025 07:25:21 GMT
- Title: Latent Interpolation Learning Using Diffusion Models for Cardiac Volume Reconstruction
- Authors: Niklas Bubeck, Suprosanna Shit, Chen Chen, Can Zhao, Pengfei Guo, Dong Yang, Georg Zitzlsberger, Daguang Xu, Bernhard Kainz, Daniel Rueckert, Jiazhen Pan,
- Abstract summary: existing methods face challenges, including reliance on predefined schemes, computational inefficiency, and dependence on additional semantic inputs.<n>We present a data-driven cardiac Latent Interpoltent Diffusion (CaLID) framework that can capture complex, non-temporal relationships between sparse slices.<n>Second, we design a computationally efficient method that operates in the latent space and speeds up 3D-heart upsampling by a factor of 24, reducing computational time.<n>Third, we extend our method to 2D+T data, enabling the effective modeling of temporal coherence.
- Score: 26.7771170972558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cardiac Magnetic Resonance (CMR) imaging is a critical tool for diagnosing and managing cardiovascular disease, yet its utility is often limited by the sparse acquisition of 2D short-axis slices, resulting in incomplete volumetric information. Accurate 3D reconstruction from these sparse slices is essential for comprehensive cardiac assessment, but existing methods face challenges, including reliance on predefined interpolation schemes (e.g., linear or spherical), computational inefficiency, and dependence on additional semantic inputs such as segmentation labels or motion data. To address these limitations, we propose a novel Cardiac Latent Interpolation Diffusion (CaLID) framework that introduces three key innovations. First, we present a data-driven interpolation scheme based on diffusion models, which can capture complex, non-linear relationships between sparse slices and improves reconstruction accuracy. Second, we design a computationally efficient method that operates in the latent space and speeds up 3D whole-heart upsampling time by a factor of 24, reducing computational overhead compared to previous methods. Third, with only sparse 2D CMR images as input, our method achieves SOTA performance against baseline methods, eliminating the need for auxiliary input such as morphological guidance, thus simplifying workflows. We further extend our method to 2D+T data, enabling the effective modeling of spatiotemporal dynamics and ensuring temporal coherence. Extensive volumetric evaluations and downstream segmentation tasks demonstrate that CaLID achieves superior reconstruction quality and efficiency. By addressing the fundamental limitations of existing approaches, our framework advances the state of the art for spatio and spatiotemporal whole-heart reconstruction, offering a robust and clinically practical solution for cardiovascular imaging.
Related papers
- Non-Intrusive Parametrized-Background Data-Weak Reconstruction of Cardiac Displacement Fields from Sparse MRI-like Observations [0.0]
We apply the non-intrusive Parametrized Data-Weak (PBDW) approach to 3D cardiac displacement reconstruction from limited MRI-like observations.<n>Our implementation requires only solution snapshots -- no governing equations, assembly routines, or solver access.<n>We demonstrate the effectiveness of the method through validation on a 3D left ventricular model with simulated scar tissue.
arXiv Detail & Related papers (2025-09-18T11:10:24Z) - TRACE: Temporally Reliable Anatomically-Conditioned 3D CT Generation with Enhanced Efficiency [40.82927972746919]
TRACE is a framework that generates 3D medical images with temporal alignment.<n>An overlapping-frame frame pairs pairs into a flexible length sequence, reconstructed into atemporally and anatomically aligned 3D volume.
arXiv Detail & Related papers (2025-07-01T14:35:39Z) - X$^{2}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction [64.2059940799033]
Current methods discretize temporal resolution into fixed phases with respiratory gating devices.<n>X$2$-Gaussian, a novel framework, enables continuous-time 4DCT reconstruction by integrating dynamic radiative splatting with self-supervised respiratory motion learning.
arXiv Detail & Related papers (2025-03-27T17:59:57Z) - Learning to Align and Refine: A Foundation-to-Diffusion Framework for Occlusion-Robust Two-Hand Reconstruction [50.952228546326516]
Two-hand reconstruction from monocular images faces persistent challenges due to complex and dynamic hand postures.<n>Existing approaches struggle with such alignment issues, often resulting in misalignment and penetration artifacts.<n>We propose a dual-stage Foundation-to-Diffusion framework that precisely align 2D prior guidance from vision foundation models.
arXiv Detail & Related papers (2025-03-22T14:42:27Z) - Motion-enhancement to Echocardiography Segmentation via Inserting a Temporal Attention Module: An Efficient, Adaptable, and Scalable Approach [4.923733944174007]
We present a novel, computation-efficient alternative where a temporal attention module extracts feature interactions multiple times.<n>The module can be seamlessly integrated into a wide range of existing CNN- or Transformer-based networks.<n>Our results confirm TAM's robustness, scalability, and generalizability across diverse datasets and backbones.
arXiv Detail & Related papers (2025-01-24T21:35:24Z) - Re-Visible Dual-Domain Self-Supervised Deep Unfolding Network for MRI Reconstruction [48.30341580103962]
We propose a novel re-visible dual-domain self-supervised deep unfolding network to address these issues.<n>We design a deep unfolding network based on Chambolle and Pock Proximal Point Algorithm (DUN-CP-PPA) to achieve end-to-end reconstruction.<n> Experiments conducted on the fastMRI and IXI datasets demonstrate that our method significantly outperforms state-of-the-art approaches in terms of reconstruction performance.
arXiv Detail & Related papers (2025-01-07T12:29:32Z) - ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - Explicit Differentiable Slicing and Global Deformation for Cardiac Mesh Reconstruction [8.730291904586656]
Mesh reconstruction of the cardiac anatomy from medical images is useful for shape and motion measurements and biophysics simulations.
Traditional voxel-based approaches rely on pre- and post-processing that compromises image fidelity.
We propose a novel explicit differentiable voxelization and slicing (DVS) algorithm that allows gradient backpropagation to a mesh from its slices.
arXiv Detail & Related papers (2024-09-03T17:19:31Z) - DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction [45.00528216648563]
Diffusion Prior Driven Neural Representation (DPER) is an unsupervised framework designed to address the exceptionally ill-posed CT reconstruction inverse problems.
DPER adopts the Half Quadratic Splitting (HQS) algorithm to decompose the inverse problem into data fidelity and distribution prior sub-problems.
We conduct comprehensive experiments to evaluate the performance of DPER on LACT and ultra-SVCT reconstruction with two public datasets.
arXiv Detail & Related papers (2024-04-27T12:55:13Z) - Fully Differentiable Correlation-driven 2D/3D Registration for X-ray to CT Image Fusion [3.868072865207522]
Image-based rigid 2D/3D registration is a critical technique for fluoroscopic guided surgical interventions.
We propose a novel fully differentiable correlation-driven network using a dual-branch CNN-transformer encoder.
A correlation-driven loss is proposed for low-frequency feature and high-frequency feature decomposition based on embedded information.
arXiv Detail & Related papers (2024-02-04T14:12:51Z) - Interpretable 2D Vision Models for 3D Medical Images [47.75089895500738]
This study proposes a simple approach of adapting 2D networks with an intermediate feature representation for processing 3D images.
We show on all 3D MedMNIST datasets as benchmark and two real-world datasets consisting of several hundred high-resolution CT or MRI scans that our approach performs on par with existing methods.
arXiv Detail & Related papers (2023-07-13T08:27:09Z) - Real-time landmark detection for precise endoscopic submucosal
dissection via shape-aware relation network [51.44506007844284]
We propose a shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection surgery.
We first devise an algorithm to automatically generate relation keypoint heatmaps, which intuitively represent the prior knowledge of spatial relations among landmarks.
We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process.
arXiv Detail & Related papers (2021-11-08T07:57:30Z) - Three-Dimensional Embedded Attentive RNN (3D-EAR) Segmentor for Left
Ventricle Delineation from Myocardial Velocity Mapping [1.8653386811342048]
We propose a novel fully automated framework incorporating a 3D-UNet backbone architecture with Embedded multichannel Attention mechanism and LSTM based Recurrent neural networks (RNN) for the MVM-CMR datasets.
By comparing the baseline model of 3D-UNet and ablation studies with and without embedded attentive LSTM modules and various loss functions, we can demonstrate that the proposed model has outperformed the state-of-the-art baseline models with significant improvement.
arXiv Detail & Related papers (2021-04-26T11:04:43Z) - Weakly-supervised Learning For Catheter Segmentation in 3D Frustum
Ultrasound [74.22397862400177]
We propose a novel Frustum ultrasound based catheter segmentation method.
The proposed method achieved the state-of-the-art performance with an efficiency of 0.25 second per volume.
arXiv Detail & Related papers (2020-10-19T13:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.