Related papers: Exploiting Completeness Perception with Diffusion Transformer for Unified 3D MRI Synthesis

Exploiting Completeness Perception with Diffusion Transformer for Unified 3D MRI Synthesis

URL: http://arxiv.org/abs/2602.18400v1
Date: Fri, 20 Feb 2026 18:05:39 GMT
Title: Exploiting Completeness Perception with Diffusion Transformer for Unified 3D MRI Synthesis
Authors: Junkai Liu, Nay Aung, Theodoros N. Arvanitis, Joao A. C. Lima, Steffen E. Petersen, Daniel C. Alexander, Le Zhang,
Abstract summary: We propose CoPeDiT, a latent diffusion model equipped with completeness perception for unified synthesis of 3D MRIs.<n>CoPeDiT significantly outperforms state-of-the-art methods, achieving superior robustness, generalizability, and flexibility.
Score: 9.857855424798732
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Missing data problems, such as missing modalities in multi-modal brain MRI and missing slices in cardiac MRI, pose significant challenges in clinical practice. Existing methods rely on external guidance to supply detailed missing state for instructing generative models to synthesize missing MRIs. However, manual indicators are not always available or reliable in real-world scenarios due to the unpredictable nature of clinical environments. Moreover, these explicit masks are not informative enough to provide guidance for improving semantic consistency. In this work, we argue that generative models should infer and recognize missing states in a self-perceptive manner, enabling them to better capture subtle anatomical and pathological variations. Towards this goal, we propose CoPeDiT, a general-purpose latent diffusion model equipped with completeness perception for unified synthesis of 3D MRIs. Specifically, we incorporate dedicated pretext tasks into our tokenizer, CoPeVAE, empowering it to learn completeness-aware discriminative prompts, and design MDiT3D, a specialized diffusion transformer architecture for 3D MRI synthesis, that effectively uses the learned prompts as guidance to enhance semantic consistency in 3D space. Comprehensive evaluations on three large-scale MRI datasets demonstrate that CoPeDiT significantly outperforms state-of-the-art methods, achieving superior robustness, generalizability, and flexibility. The code is available at https://github.com/JK-Liu7/CoPeDiT .

Related papers

MIRAGE: Knowledge Graph-Guided Cross-Cohort MRI Synthesis for Alzheimer's Disease Prediction [15.543131466384658]
We introduce MIRAGE, a novel framework that reframes the missing-MRI problem as an anatomy-guided cross-modal latent distillation task.<n>We employ a frozen pre-trained 3D U-Net decoder strictly as an auxiliary regularization engine.<n>Experiments demonstrate that our framework successfully bridges the missing-modality gap, improving the AD classification rate by 13%.
arXiv Detail & Related papers (2026-03-02T22:17:37Z)
Multimodal Visual Surrogate Compression for Alzheimer's Disease Classification [69.87877580725768]
Multimodal Visual Surrogate Compression (MVSC) learns to compress and adapt large 3D sMRI volumes into compact 2D features.<n>MVSC has two key components: a Volume Context that captures global cross-slice context under textual guidance, and an Adaptive Slice Fusion module that aggregates slice-level information in a text-enhanced, patch-wise manner.
arXiv Detail & Related papers (2026-01-29T13:05:46Z)
Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction [65.67001243986981]
We propose MindHier, a coarse-to-fine fMRI-to-image reconstruction framework built on scale-wise autoregressive modeling.<n>MindHier achieves superior semantic fidelity, 4.67x faster inference, and more deterministic results than the diffusion-based baselines.
arXiv Detail & Related papers (2025-10-25T15:40:07Z)
M3Ret: Unleashing Zero-shot Multimodal Medical Image Retrieval via Self-Supervision [24.846428105192405]
We train M3Ret, a unified visual encoder, without any modality-specific customization.<n>It successfully learns transferable representations using both generative (MAE) and contrastive (SimDINO) self-supervised learning (SSL) paradigms.<n>Our approach sets a new state-of-the-art in zero-shot image-to-image retrieval across all individual modalities, surpassing strong baselines such as DINOv3 and the text-supervised BMC-CLIP.
arXiv Detail & Related papers (2025-09-01T10:59:39Z)
ZECO: ZeroFusion Guided 3D MRI Conditional Generation [11.645873358288648]
ZECO is a ZeroFusion guided 3D MRI conditional generation framework.<n>It extracts, compresses, and generates high-fidelity MRI images with corresponding 3D segmentation masks.<n>ZECO outperforms state-of-the-art models in both quantitative and qualitative evaluations on Brain MRI datasets.
arXiv Detail & Related papers (2025-03-24T00:04:52Z)
Unified 3D MRI Representations via Sequence-Invariant Contrastive Learning [0.15749416770494706]
Self-supervised deep learning has accelerated 2D natural image analysis but remains difficult to translate into 3D MRI.<n>We present a emph-sequence-invariant self-supervised framework leveraging quantitative MRI (qMRI)<n> Experiments on healthy brain segmentation (IXI), stroke lesion segmentation (ARC), and MRI denoising show significant gains over baseline SSL approaches.
arXiv Detail & Related papers (2025-01-21T11:27:54Z)
ContextMRI: Enhancing Compressed Sensing MRI through Metadata Conditioning [51.26601171361753]
We propose ContextMRI, a text-conditioned diffusion model for MRI that integrates granular metadata into the reconstruction process.<n>We show that increasing the fidelity of metadata, ranging from slice location and contrast to patient age, sex, and pathology, systematically boosts reconstruction performance.
arXiv Detail & Related papers (2025-01-08T05:15:43Z)
MRGen: Segmentation Data Engine for Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically important imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize data, for training segmentation models for underrepresented modalities.<n>We present MRGen, a data engine for controllable medical image synthesis conditioned on text prompts and segmentation masks.
arXiv Detail & Related papers (2024-12-04T16:34:22Z)
Towards Synergistic Deep Learning Models for Volumetric Cirrhotic Liver Segmentation in MRIs [1.5228650878164722]
Liver cirrhosis, a leading cause of global mortality, requires precise segmentation of ROIs for effective disease monitoring and treatment planning. Existing segmentation models often fail to capture complex feature interactions and generalize across diverse datasets. We propose a novel synergistic theory that leverages complementary latent spaces for enhanced feature interaction modeling.
arXiv Detail & Related papers (2024-08-08T14:41:32Z)
NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation [55.51412454263856]
This paper proposes to directly modulate the generation process of diffusion models using fMRI signals. By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity.
arXiv Detail & Related papers (2024-03-27T02:42:52Z)
SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification. The proposed framework has been validated through comprehensive experiments on two clinical datasets. To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z)
Multiscale Metamorphic VAE for 3D Brain MRI Synthesis [5.060516201839319]
Generative modeling of 3D brain MRIs presents difficulties in achieving high visual fidelity while ensuring sufficient coverage of the data distribution. In this work, we propose to address this challenge with composable, multiscale morphological transformations in a variational autoencoder framework. We show substantial performance improvements in FID while retaining comparable, or superior, reconstruction quality compared to prior work based on VAEs and generative adversarial networks (GANs)
arXiv Detail & Related papers (2023-01-09T09:15:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.