Lite-Mind: Towards Efficient and Robust Brain Representation Network
- URL: http://arxiv.org/abs/2312.03781v4
- Date: Thu, 1 Aug 2024 07:29:47 GMT
- Title: Lite-Mind: Towards Efficient and Robust Brain Representation Network
- Authors: Zixuan Gong, Qi Zhang, Guangyin Bao, Lei Zhu, Ke Liu, Liang Hu, Duoqian Miao, Yu Zhang,
- Abstract summary: Lite-Mind is a lightweight, efficient, and robust brain representation learning paradigm based on Discrete Frequency Transform (DFT)
We show Lite-Mind achieves an impressive 94.6% fMRI-to-image retrieval accuracy on the NSD dataset for Subject 1, with 98.7% fewer parameters than MindEye.
Lite-Mind is also proven to be able to be migrated to smaller fMRI datasets and establishes a new state-of-the-art for zero-shot classification on the GOD dataset.
- Score: 23.132310312815967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The limited data availability and the low signal-to-noise ratio of fMRI signals lead to the challenging task of fMRI-to-image retrieval. State-of-the-art MindEye remarkably improves fMRI-to-image retrieval performance by leveraging a large model, i.e., a 996M MLP Backbone per subject, to align fMRI embeddings to the final hidden layer of CLIP's Vision Transformer (ViT). However, significant individual variations exist among subjects, even under identical experimental setups, mandating the training of large subject-specific models. The substantial parameters pose significant challenges in deploying fMRI decoding on practical devices. To this end, we propose Lite-Mind, a lightweight, efficient, and robust brain representation learning paradigm based on Discrete Fourier Transform (DFT), which efficiently aligns fMRI voxels to fine-grained information of CLIP. We elaborately design a DFT backbone with Spectrum Compression and Frequency Projector modules to learn informative and robust voxel embeddings. Our experiments demonstrate that Lite-Mind achieves an impressive 94.6% fMRI-to-image retrieval accuracy on the NSD dataset for Subject 1, with 98.7% fewer parameters than MindEye. Lite-Mind is also proven to be able to be migrated to smaller fMRI datasets and establishes a new state-of-the-art for zero-shot classification on the GOD dataset.
Related papers
- Large-Scale Data-Free Knowledge Distillation for ImageNet via Multi-Resolution Data Generation [53.95204595640208]
Data-Free Knowledge Distillation (DFKD) is an advanced technique that enables knowledge transfer from a teacher model to a student model without relying on original training data.
Previous approaches have generated synthetic images at high resolutions without leveraging information from real images.
MUSE generates images at lower resolutions while using Class Activation Maps (CAMs) to ensure that the generated images retain critical, class-specific features.
arXiv Detail & Related papers (2024-11-26T02:23:31Z) - Residual Vision Transformer (ResViT) Based Self-Supervised Learning Model for Brain Tumor Classification [0.08192907805418585]
Self-supervised learning models provide data-efficient and remarkable solutions to limited dataset problems.
This paper introduces a generative SSL model for brain tumor classification in two stages.
The proposed model attains the highest accuracy, achieving 90.56% on the BraTs dataset with T1 sequence, 98.53% on the Figshare, and 98.47% on the Kaggle brain tumor datasets.
arXiv Detail & Related papers (2024-11-19T21:42:57Z) - LLM4Brain: Training a Large Language Model for Brain Video Understanding [9.294352205183726]
We introduce an LLM-based approach for reconstructing visual-semantic information from fMRI signals elicited by video stimuli.
We employ fine-tuning techniques on an fMRI encoder equipped with adaptors to transform brain responses into latent representations aligned with the video stimuli.
In particular, we integrate self-supervised domain adaptation methods to enhance the alignment between visual-semantic information and brain responses.
arXiv Detail & Related papers (2024-09-26T15:57:08Z) - MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer.
This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation.
Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z) - NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation [55.51412454263856]
This paper proposes to directly modulate the generation process of diffusion models using fMRI signals.
By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity.
arXiv Detail & Related papers (2024-03-27T02:42:52Z) - Transferring Ultrahigh-Field Representations for Intensity-Guided Brain
Segmentation of Low-Field Magnetic Resonance Imaging [51.92395928517429]
The use of 7T MRI is limited by its high cost and lower accessibility compared to low-field (LF) MRI.
This study proposes a deep-learning framework that fuses the input LF magnetic resonance feature representations with the inferred 7T-like feature representations for brain image segmentation tasks.
arXiv Detail & Related papers (2024-02-13T12:21:06Z) - fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for
Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training.
Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns.
Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z) - Learning Sequential Information in Task-based fMRI for Synthetic Data
Augmentation [10.629487323161323]
We propose an approach for generating synthetic fMRI sequences that can be used to create augmented training datasets in downstream learning.
The synthetic images are evaluated from multiple perspectives including visualizations and an autism spectrum disorder (ASD) classification task.
arXiv Detail & Related papers (2023-08-29T18:36:21Z) - BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
for Generic Natural Visual Stimulus Decoding [51.911473457195555]
BrainCLIP is a task-agnostic fMRI-based brain decoding model.
It bridges the modality gap between brain activity, image, and text.
BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
arXiv Detail & Related papers (2023-02-25T03:28:54Z) - MouseGAN++: Unsupervised Disentanglement and Contrastive Representation
for Multiple MRI Modalities Synthesis and Structural Segmentation of Mouse
Brain [4.733517098000804]
multimodal mouse brain MRI data is often lacking, making automatic segmentation of mouse brain fine structure a very challenging task.
We propose a novel disentangled and contrastive GAN-based framework, named MouseGAN++, to synthesize multiple MR modalities from single ones in a structure-preserving manner.
Using the subsequently learned modality-invariant information as well as the modality-translated images, MouseGAN++ can segment fine brain structures with averaged dice coefficients of 90.0% (T2w) and 87.9% (T1w)
arXiv Detail & Related papers (2022-12-04T14:19:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.