Lite-Mind: Towards Efficient and Robust Brain Representation Network
- URL: http://arxiv.org/abs/2312.03781v3
- Date: Fri, 19 Apr 2024 05:45:25 GMT
- Title: Lite-Mind: Towards Efficient and Robust Brain Representation Network
- Authors: Zixuan Gong, Qi Zhang, Guangyin Bao, Lei Zhu, Yu Zhang, Ke Liu, Liang Hu, Duoqian Miao,
- Abstract summary: Lite-Mind is a lightweight, efficient, and robust brain representation learning paradigm based on Discrete Frequency Transform (DFT)
We show Lite-Mind achieves an impressive 94.6% fMRI-to-image retrieval accuracy on the NSD dataset for Subject 1, with 98.7% fewer parameters than MindEye.
Lite-Mind is also proven to be able to be migrated to smaller fMRI datasets and establishes a new state-of-the-art for zero-shot classification on the GOD dataset.
- Score: 23.13231031281597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The limited data availability and the low signal-to-noise ratio of fMRI signals lead to the challenging task of fMRI-to-image retrieval. State-of-the-art MindEye remarkably improves fMRI-to-image retrieval performance by leveraging a large model, i.e., a 996M MLP Backbone per subject, to align fMRI embeddings to the final hidden layer of CLIP's Vision Transformer (ViT). However, significant individual variations exist among subjects, even under identical experimental setups, mandating the training of large subject-specific models. The substantial parameters pose significant challenges in deploying fMRI decoding on practical devices. To this end, we propose Lite-Mind, a lightweight, efficient, and robust brain representation learning paradigm based on Discrete Fourier Transform (DFT), which efficiently aligns fMRI voxels to fine-grained information of CLIP. We elaborately design a DFT backbone with Spectrum Compression and Frequency Projector modules to learn informative and robust voxel embeddings. Our experiments demonstrate that Lite-Mind achieves an impressive 94.6% fMRI-to-image retrieval accuracy on the NSD dataset for Subject 1, with 98.7% fewer parameters than MindEye. Lite-Mind is also proven to be able to be migrated to smaller fMRI datasets and establishes a new state-of-the-art for zero-shot classification on the GOD dataset.
Related papers
- MindFormer: A Transformer Architecture for Multi-Subject Brain Decoding via fMRI [50.55024115943266]
We introduce a new Transformer architecture called MindFormer to generate fMRI-conditioned feature vectors.
MindFormer incorporates two key innovations: 1) a novel training strategy based on the IP-Adapter to extract semantically meaningful features from fMRI signals, and 2) a subject specific token and linear layer that effectively capture individual differences in fMRI signals.
arXiv Detail & Related papers (2024-05-28T00:36:25Z) - NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation [55.51412454263856]
This paper proposes to directly modulate the generation process of diffusion models using fMRI signals.
By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity.
arXiv Detail & Related papers (2024-03-27T02:42:52Z) - Transferring Ultrahigh-Field Representations for Intensity-Guided Brain
Segmentation of Low-Field Magnetic Resonance Imaging [51.92395928517429]
The use of 7T MRI is limited by its high cost and lower accessibility compared to low-field (LF) MRI.
This study proposes a deep-learning framework that fuses the input LF magnetic resonance feature representations with the inferred 7T-like feature representations for brain image segmentation tasks.
arXiv Detail & Related papers (2024-02-13T12:21:06Z) - fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for
Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training.
Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns.
Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z) - Learning Sequential Information in Task-based fMRI for Synthetic Data
Augmentation [10.629487323161323]
We propose an approach for generating synthetic fMRI sequences that can be used to create augmented training datasets in downstream learning.
The synthetic images are evaluated from multiple perspectives including visualizations and an autism spectrum disorder (ASD) classification task.
arXiv Detail & Related papers (2023-08-29T18:36:21Z) - AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context
Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images.
AMIGO uses the celluar graph within the tissue to provide a single representation for a patient.
We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z) - BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
for Generic Natural Visual Stimulus Decoding [51.911473457195555]
BrainCLIP is a task-agnostic fMRI-based brain decoding model.
It bridges the modality gap between brain activity, image, and text.
BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
arXiv Detail & Related papers (2023-02-25T03:28:54Z) - MouseGAN++: Unsupervised Disentanglement and Contrastive Representation
for Multiple MRI Modalities Synthesis and Structural Segmentation of Mouse
Brain [4.733517098000804]
multimodal mouse brain MRI data is often lacking, making automatic segmentation of mouse brain fine structure a very challenging task.
We propose a novel disentangled and contrastive GAN-based framework, named MouseGAN++, to synthesize multiple MR modalities from single ones in a structure-preserving manner.
Using the subsequently learned modality-invariant information as well as the modality-translated images, MouseGAN++ can segment fine brain structures with averaged dice coefficients of 90.0% (T2w) and 87.9% (T1w)
arXiv Detail & Related papers (2022-12-04T14:19:49Z) - Interpretability Aware Model Training to Improve Robustness against
Out-of-Distribution Magnetic Resonance Images in Alzheimer's Disease
Classification [8.050897403457995]
We propose an interpretability aware adversarial training regime to improve robustness against out-of-distribution samples originating from different MRI hardware.
We present preliminary results showing promising performance on out-of-distribution samples.
arXiv Detail & Related papers (2021-11-15T04:42:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.