Related papers: OmniMRI: A Unified Vision--Language Foundation Model for Generalist MRI Interpretation

OmniMRI: A Unified Vision--Language Foundation Model for Generalist MRI Interpretation

URL: http://arxiv.org/abs/2508.17524v1
Date: Sun, 24 Aug 2025 21:11:28 GMT
Title: OmniMRI: A Unified Vision--Language Foundation Model for Generalist MRI Interpretation
Authors: Xingxin He, Aurora Rofena, Ruimin Feng, Haozhe Liao, Zhaoye Zhou, Albert Jang, Fang Liu,
Abstract summary: We introduce OmniMRI, a unified vision-language foundation model designed to generalize across the entire MRI workflow.<n> OmniMRI is trained on a large-scale, heterogeneous corpus curated from 60 public datasets.<n>Results demonstrate OmniMRI's ability to perform diverse tasks within a single architecture.
Score: 5.3427577036717
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Magnetic Resonance Imaging (MRI) is indispensable in clinical practice but remains constrained by fragmented, multi-stage workflows encompassing acquisition, reconstruction, segmentation, detection, diagnosis, and reporting. While deep learning has achieved progress in individual tasks, existing approaches are often anatomy- or application-specific and lack generalizability across diverse clinical settings. Moreover, current pipelines rarely integrate imaging data with complementary language information that radiologists rely on in routine practice. Here, we introduce OmniMRI, a unified vision-language foundation model designed to generalize across the entire MRI workflow. OmniMRI is trained on a large-scale, heterogeneous corpus curated from 60 public datasets, over 220,000 MRI volumes and 19 million MRI slices, incorporating image-only data, paired vision-text data, and instruction-response data. Its multi-stage training paradigm, comprising self-supervised vision pretraining, vision-language alignment, multimodal pretraining, and multi-task instruction tuning, progressively equips the model with transferable visual representations, cross-modal reasoning, and robust instruction-following capabilities. Qualitative results demonstrate OmniMRI's ability to perform diverse tasks within a single architecture, including MRI reconstruction, anatomical and pathological segmentation, abnormality detection, diagnostic suggestion, and radiology report generation. These findings highlight OmniMRI's potential to consolidate fragmented pipelines into a scalable, generalist framework, paving the way toward foundation models that unify imaging and clinical language for comprehensive, end-to-end MRI interpretation.

Related papers

brat: Aligned Multi-View Embeddings for Brain MRI Analysis [36.795218160666266]
brat is a multi-view representation learning framework for brain magnetic resonance imaging (MRI) trained on MRIs paired with clinical reports.<n>Brain MRIs present unique challenges due to the presence of numerous, highly varied, and often subtle abnormalities that are localized to a few slices within a 3D volume.
arXiv Detail & Related papers (2025-12-21T10:37:31Z)
Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations [12.805804608410739]
We present Decipher-MR, a 3D MRI-specific vision-language foundation model trained on a large-scale dataset.<n>Decipher-MR integrates self-supervised vision learning with report-guided text supervision to build robust, generalizable representations.<n>Our results establish Decipher-MR as a scalable and versatile foundation for MRI-based AI, facilitating efficient development across clinical and research domains.
arXiv Detail & Related papers (2025-09-25T14:43:33Z)
M3Ret: Unleashing Zero-shot Multimodal Medical Image Retrieval via Self-Supervision [24.846428105192405]
We train M3Ret, a unified visual encoder, without any modality-specific customization.<n>It successfully learns transferable representations using both generative (MAE) and contrastive (SimDINO) self-supervised learning (SSL) paradigms.<n>Our approach sets a new state-of-the-art in zero-shot image-to-image retrieval across all individual modalities, surpassing strong baselines such as DINOv3 and the text-supervised BMC-CLIP.
arXiv Detail & Related papers (2025-09-01T10:59:39Z)
Large-scale Multi-sequence Pretraining for Generalizable MRI Analysis in Versatile Clinical Applications [15.846703688846086]
In this study, we present PRISM, a foundation model PRe-trained with large-scale multI-Sequence MRI.<n>We propose a novel pretraining paradigm that disentangles anatomically invariant features from sequence-specific variations in MRI.<n>PRISM consistently outperformed both non-pretrained models and existing foundation models.
arXiv Detail & Related papers (2025-08-10T03:31:46Z)
Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation [56.52520416420957]
We propose Multimodal Causal-Driven Representation Learning (MCDRL) to tackle domain generalization in medical image segmentation.<n>MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.
arXiv Detail & Related papers (2025-08-07T03:41:41Z)
ContextMRI: Enhancing Compressed Sensing MRI through Metadata Conditioning [51.26601171361753]
We propose ContextMRI, a text-conditioned diffusion model for MRI that integrates granular metadata into the reconstruction process.<n>We show that increasing the fidelity of metadata, ranging from slice location and contrast to patient age, sex, and pathology, systematically boosts reconstruction performance.
arXiv Detail & Related papers (2025-01-08T05:15:43Z)
MRGen: Segmentation Data Engine for Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically important imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize data, for training segmentation models for underrepresented modalities.<n>We present MRGen, a data engine for controllable medical image synthesis conditioned on text prompts and segmentation masks.
arXiv Detail & Related papers (2024-12-04T16:34:22Z)
Towards General Text-guided Image Synthesis for Customized Multimodal Brain MRI Generation [51.28453192441364]
Multimodal brain magnetic resonance (MR) imaging is indispensable in neuroscience and neurology. Current MR image synthesis approaches are typically trained on independent datasets for specific tasks. We present TUMSyn, a Text-guided Universal MR image Synthesis model, which can flexibly generate brain MR images.
arXiv Detail & Related papers (2024-09-25T11:14:47Z)
MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer. This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation. Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z)
SegmentAnyBone: A Universal Model that Segments Any Bone at Any Location on MRI [13.912230325828943]
We propose a versatile, publicly available deep-learning model for bone segmentation in MRI across multiple standard MRI locations. The proposed model can operate in two modes: fully automated segmentation and prompt-based segmentation. Our contributions include (1) collecting and annotating a new MRI dataset across various MRI protocols, encompassing over 300 annotated volumes and 8485 annotated slices across diverse anatomic regions.
arXiv Detail & Related papers (2024-01-23T18:59:25Z)
fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training. Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns. Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z)
Explainable unsupervised multi-modal image registration using deep networks [2.197364252030876]
MRI image registration aims to geometrically 'pair' diagnoses from different modalities, time points and slices. In this work, we show that our DL model becomes fully explainable, setting the framework to generalise our approach on further medical imaging data.
arXiv Detail & Related papers (2023-08-03T19:13:48Z)
BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP for Generic Natural Visual Stimulus Decoding [51.911473457195555]
BrainCLIP is a task-agnostic fMRI-based brain decoding model. It bridges the modality gap between brain activity, image, and text. BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
arXiv Detail & Related papers (2023-02-25T03:28:54Z)
DIGEST: Deeply supervIsed knowledGE tranSfer neTwork learning for brain tumor segmentation with incomplete multi-modal MRI scans [16.93394669748461]
Brain tumor segmentation based on multi-modal magnetic resonance imaging (MRI) plays a pivotal role in assisting brain cancer diagnosis, treatment, and postoperative evaluations. Despite the achieved inspiring performance by existing automatic segmentation methods, multi-modal MRI data are still unavailable in real-world clinical applications. We propose a Deeply supervIsed knowledGE tranSfer neTwork (DIGEST), which achieves accurate brain tumor segmentation under different modality-missing scenarios.
arXiv Detail & Related papers (2022-11-15T09:01:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.