brat: Aligned Multi-View Embeddings for Brain MRI Analysis
- URL: http://arxiv.org/abs/2512.18679v1
- Date: Sun, 21 Dec 2025 10:37:31 GMT
- Title: brat: Aligned Multi-View Embeddings for Brain MRI Analysis
- Authors: Maxime Kayser, Maksim Gridnev, Wanting Wang, Max Bain, Aneesh Rangnekar, Avijit Chatterjee, Aleksandr Petrov, Harini Veeraraghavan, Nathaniel C. Swinburne,
- Abstract summary: brat is a multi-view representation learning framework for brain magnetic resonance imaging (MRI) trained on MRIs paired with clinical reports.<n>Brain MRIs present unique challenges due to the presence of numerous, highly varied, and often subtle abnormalities that are localized to a few slices within a 3D volume.
- Score: 36.795218160666266
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present brat (brain report alignment transformer), a multi-view representation learning framework for brain magnetic resonance imaging (MRI) trained on MRIs paired with clinical reports. Brain MRIs present unique challenges due to the presence of numerous, highly varied, and often subtle abnormalities that are localized to a few slices within a 3D volume. To address these challenges, we introduce a brain MRI dataset $10\times$ larger than existing ones, containing approximately 80,000 3D scans with corresponding radiology reports, and propose a multi-view pre-training approach inspired by advances in document retrieval. We develop an implicit query-feature matching mechanism and adopt concepts from quality-diversity to obtain multi-view embeddings of MRIs that are aligned with the clinical features given by report sentences. We evaluate our approach across multiple vision-language and vision tasks, demonstrating substantial performance improvements. The brat foundation models are publicly released.
Related papers
- Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations [12.805804608410739]
We present Decipher-MR, a 3D MRI-specific vision-language foundation model trained on a large-scale dataset.<n>Decipher-MR integrates self-supervised vision learning with report-guided text supervision to build robust, generalizable representations.<n>Our results establish Decipher-MR as a scalable and versatile foundation for MRI-based AI, facilitating efficient development across clinical and research domains.
arXiv Detail & Related papers (2025-09-25T14:43:33Z) - Integrating Anatomical Priors into a Causal Diffusion Model [14.471851828800055]
3D brain MRI studies often examine subtle morphometric differences that are hard to detect visually.<n>Counterfactual models struggle to produce plausible MRIs due to the lack of explicit inductive biases to preserve fine-grained anatomical details.<n>We propose to explicitly integrate anatomical constraints on a voxel-level as prior into a generative diffusion framework.
arXiv Detail & Related papers (2025-09-10T23:22:05Z) - M3Ret: Unleashing Zero-shot Multimodal Medical Image Retrieval via Self-Supervision [24.846428105192405]
We train M3Ret, a unified visual encoder, without any modality-specific customization.<n>It successfully learns transferable representations using both generative (MAE) and contrastive (SimDINO) self-supervised learning (SSL) paradigms.<n>Our approach sets a new state-of-the-art in zero-shot image-to-image retrieval across all individual modalities, surpassing strong baselines such as DINOv3 and the text-supervised BMC-CLIP.
arXiv Detail & Related papers (2025-09-01T10:59:39Z) - OmniMRI: A Unified Vision--Language Foundation Model for Generalist MRI Interpretation [5.3427577036717]
We introduce OmniMRI, a unified vision-language foundation model designed to generalize across the entire MRI workflow.<n> OmniMRI is trained on a large-scale, heterogeneous corpus curated from 60 public datasets.<n>Results demonstrate OmniMRI's ability to perform diverse tasks within a single architecture.
arXiv Detail & Related papers (2025-08-24T21:11:28Z) - Multi-modal Vision Pre-training for Medical Image Analysis [11.569448567735435]
Self-supervised learning has greatly facilitated medical image analysis by suppressing the training data requirement for real-world applications.<n>We conduct a novel multi-modal image pre-training with three proxy tasks to facilitate the learning of cross-modality representations and correlations.<n>Our method is reported in comparison to state-of-the-art pre-training methods, with Dice Score improvement of 0.28%-14.47% across six segmentation benchmarks and a consistent accuracy boost of 0.65%-18.07% in four individual image classification tasks.
arXiv Detail & Related papers (2024-10-14T15:12:16Z) - Towards General Text-guided Image Synthesis for Customized Multimodal Brain MRI Generation [51.28453192441364]
Multimodal brain magnetic resonance (MR) imaging is indispensable in neuroscience and neurology.
Current MR image synthesis approaches are typically trained on independent datasets for specific tasks.
We present TUMSyn, a Text-guided Universal MR image Synthesis model, which can flexibly generate brain MR images.
arXiv Detail & Related papers (2024-09-25T11:14:47Z) - MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer.
This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation.
Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z) - Brain3D: Generating 3D Objects from fMRI [78.46936519561298]
We design a novel 3D object representation learning method, Brain3D, that takes as input the fMRI data of a subject.<n>We show that our model captures the distinct functionalities of each region of human vision system.<n>Preliminary evaluations indicate that Brain3D can successfully identify the disordered brain regions in simulated scenarios.
arXiv Detail & Related papers (2024-05-24T06:06:11Z) - fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for
Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training.
Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns.
Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z) - Video4MRI: An Empirical Study on Brain Magnetic Resonance Image
Analytics with CNN-based Video Classification Frameworks [60.42012344842292]
3D CNN-based models dominate the field of magnetic resonance image (MRI) analytics.
In this paper, four datasets of Alzheimer's and Parkinson's disease recognition are utilized in experiments.
In terms of efficiency, the video framework performs better than 3D-CNN models by 5% - 11% with 50% - 66% less trainable parameters.
arXiv Detail & Related papers (2023-02-24T15:26:31Z) - DIGEST: Deeply supervIsed knowledGE tranSfer neTwork learning for brain
tumor segmentation with incomplete multi-modal MRI scans [16.93394669748461]
Brain tumor segmentation based on multi-modal magnetic resonance imaging (MRI) plays a pivotal role in assisting brain cancer diagnosis, treatment, and postoperative evaluations.
Despite the achieved inspiring performance by existing automatic segmentation methods, multi-modal MRI data are still unavailable in real-world clinical applications.
We propose a Deeply supervIsed knowledGE tranSfer neTwork (DIGEST), which achieves accurate brain tumor segmentation under different modality-missing scenarios.
arXiv Detail & Related papers (2022-11-15T09:01:14Z) - MEDUSA: Multi-scale Encoder-Decoder Self-Attention Deep Neural Network
Architecture for Medical Image Analysis [71.2022403915147]
We introduce MEDUSA, a multi-scale encoder-decoder self-attention mechanism tailored for medical image analysis.
We obtain state-of-the-art performance on challenging medical image analysis benchmarks including COVIDx, RSNA RICORD, and RSNA Pneumonia Challenge.
arXiv Detail & Related papers (2021-10-12T15:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.