Any-to-All MRI Synthesis: A Unified Foundation Model for Nasopharyngeal Carcinoma and Its Downstream Applications
- URL: http://arxiv.org/abs/2602.08822v1
- Date: Mon, 09 Feb 2026 15:56:53 GMT
- Title: Any-to-All MRI Synthesis: A Unified Foundation Model for Nasopharyngeal Carcinoma and Its Downstream Applications
- Authors: Yao Pu, Yiming Shi, Zhenxi Zhang, Peixin Yu, Yitao Zhuang, Xiang Wang, Hongzhao Chen, Jing Cai, Ge Ren,
- Abstract summary: Traditional MRI synthesis methods are modality-specific, limited in anatomical adaptability, and lack clinical interpretability-failing to meet NPC's RT needs.<n>Here, we developed a unified foundation model integrating contrastive visual representation learning and vision-language alignment (VLA) to enable any-to-all MRI synthesis.<n>The model uses a contrastive encoder for modality-invariant representations and a CLIP-based text-informed decoder for semantically consistent synthesis, supporting any-to-all MRI synthesis via one unified foundation model.
- Score: 11.15455619035484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Magnetic resonance imaging (MRI) is essential for nasopharyngeal carcinoma (NPC) radiotherapy (RT), but practical constraints, such as patient discomfort, long scan times, and high costs often lead to incomplete modalities in clinical practice, compromising RT planning accuracy. Traditional MRI synthesis methods are modality-specific, limited in anatomical adaptability, and lack clinical interpretability-failing to meet NPC's RT needs. Here, we developed a unified foundation model integrating contrastive visual representation learning and vision-language alignment (VLA) to enable any-to-all MRI synthesis. The model uses a contrastive encoder for modality-invariant representations and a CLIP-based text-informed decoder for semantically consistent synthesis, supporting any-to-all MRI synthesis via one unified foundation model. Trained on 40,825 images from 13 institutions, it achieves consistently high performance (average SSIM 0.90, PSNR 27) across 26 internal/external validation sites (15,748 images), with superior synthesis fidelity and robustness to noise and domain shifts. Meanwhile, its unified representation enhances downstream RT-relevant tasks (e.g., segmentation). This work advances digital medicine solutions for NPC care by leveraging foundation models to bridge technical synthesis and clinical utility.
Related papers
- Exploiting Completeness Perception with Diffusion Transformer for Unified 3D MRI Synthesis [9.857855424798732]
We propose CoPeDiT, a latent diffusion model equipped with completeness perception for unified synthesis of 3D MRIs.<n>CoPeDiT significantly outperforms state-of-the-art methods, achieving superior robustness, generalizability, and flexibility.
arXiv Detail & Related papers (2026-02-20T18:05:39Z) - Synthetic Vasculature and Pathology Enhance Vision-Language Model Reasoning [39.96133625333846]
We introduce Synthetic Vasculature Reasoning (SVR), a framework that controllably synthesizes images and corresponding text.<n>Based on this we curate OCTA-100K-SVR, an OCTA image-reasoning dataset with 100,000 pairs.<n>Our experiments show that a general-purpose VLM trained on the dataset achieves a zero-shot balanced classification accuracy of 89.67% on real OCTA images.
arXiv Detail & Related papers (2025-12-11T19:19:39Z) - MedVSR: Medical Video Super-Resolution with Cross State-Space Propagation [63.38824041721275]
Low-resolution (LR) medical videos present unique challenges for video super-resolution (VSR) models.<n>We propose MedVSR, a tailored framework for medical VSR.<n>We show that MedVSR significantly outperforms existing VSR models in reconstruction performance and efficiency.
arXiv Detail & Related papers (2025-09-25T14:56:59Z) - Towards a general-purpose foundation model for fMRI analysis [58.06455456423138]
We introduce NeuroSTORM, a framework that learns from 4D fMRI volumes and enables efficient knowledge transfer across diverse applications.<n>NeuroSTORM is pre-trained on 28.65 million fMRI frames (>9,000 hours) from over 50,000 subjects across multiple centers and ages 5 to 100.<n>It outperforms existing methods across five tasks: age/gender prediction, phenotype prediction, disease diagnosis, fMRI-to-image retrieval, and task-based fMRI.
arXiv Detail & Related papers (2025-06-11T23:51:01Z) - ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis [0.715632820500919]
Existing methods struggle to maintain anatomical fidelity while accurately modeling pathological features.<n>ViCTr is a novel two-stage framework that combines a rectified flow trajectory with a Tweedie-corrected diffusion process to achieve high-fidelity, pathology-aware image synthesis.<n>To our knowledge, ViCTr is the first method to provide fine-grained, pathology-aware MRI synthesis with graded severity control.
arXiv Detail & Related papers (2025-05-08T05:44:16Z) - Interactive Gadolinium-Free MRI Synthesis: A Transformer with Localization Prompt Learning [6.716077690014641]
We propose a novel Transformer with Localization Prompts framework for synthesizing CE-MRI from non-contrast MR images.<n>Our architecture introduces three key innovations: a hierarchical backbone that uses efficient Transformer to process multi-scale features; a multi-stage fusion system that hierarchically integrate complementary information via spatial attention operations and cross-attention mechanisms, respectively.<n>The framework enables interactive clinical integration by allowing radiologists to input diagnostic prompts during inference, synergizing artificial intelligence with medical expertise.
arXiv Detail & Related papers (2025-03-03T07:44:28Z) - A Unified Model for Compressed Sensing MRI Across Undersampling Patterns [69.19631302047569]
We propose a unified MRI reconstruction model robust to various measurement undersampling patterns and image resolutions.<n>Our model improves SSIM by 11% and PSNR by 4 dB over a state-of-the-art CNN (End-to-End VarNet) with 600$times$ faster inference than diffusion methods.
arXiv Detail & Related papers (2024-10-05T20:03:57Z) - Towards General Text-guided Image Synthesis for Customized Multimodal Brain MRI Generation [51.28453192441364]
Multimodal brain magnetic resonance (MR) imaging is indispensable in neuroscience and neurology.
Current MR image synthesis approaches are typically trained on independent datasets for specific tasks.
We present TUMSyn, a Text-guided Universal MR image Synthesis model, which can flexibly generate brain MR images.
arXiv Detail & Related papers (2024-09-25T11:14:47Z) - Leveraging Multimodal CycleGAN for the Generation of Anatomically Accurate Synthetic CT Scans from MRIs [1.779948689352186]
We analyse the capabilities of different configurations of Deep Learning models to generate synthetic CT scans from MRI.
Several CycleGAN models were trained unsupervised to generate CT scans from different MRI modalities with and without contrast agents.
The results show how, depending on the input modalities, the models can have very different performances.
arXiv Detail & Related papers (2024-07-15T16:38:59Z) - CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI [40.11088079783521]
The CMRxRecon2024 dataset is the largest and most protocal-diverse publicly available cardiac k-space dataset.<n>It is acquired from 330 healthy volunteers, covering commonly used modalities, anatomical views, and acquisition trajectories in clinical cardiac MRI.
arXiv Detail & Related papers (2024-06-27T09:50:20Z) - Model-Guided Multi-Contrast Deep Unfolding Network for MRI
Super-resolution Reconstruction [68.80715727288514]
We show how to unfold an iterative MGDUN algorithm into a novel model-guided deep unfolding network by taking the MRI observation matrix.
In this paper, we propose a novel Model-Guided interpretable Deep Unfolding Network (MGDUN) for medical image SR reconstruction.
arXiv Detail & Related papers (2022-09-15T03:58:30Z) - ResViT: Residual vision transformers for multi-modal medical image
synthesis [0.0]
We propose a novel generative adversarial approach for medical image synthesis, ResViT, to combine local precision of convolution operators with contextual sensitivity of vision transformers.
Our results indicate the superiority of ResViT against competing methods in terms of qualitative observations and quantitative metrics.
arXiv Detail & Related papers (2021-06-30T12:57:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.