Optimizing fMRI Data Acquisition for Decoding Natural Speech with Limited Participants
- URL: http://arxiv.org/abs/2505.21304v1
- Date: Tue, 27 May 2025 15:06:04 GMT
- Title: Optimizing fMRI Data Acquisition for Decoding Natural Speech with Limited Participants
- Authors: Louis Jalouzot, Alexis Thual, Yair Lakretz, Christophe Pallier, Bertrand Thirion,
- Abstract summary: We investigate optimal strategies for decoding perceived natural speech from fMRI data acquired from a limited number of participants.<n>We first demonstrate the effectiveness of training deep neural networks to predict text representations from fMRI activity.<n>We observe that multi-subject training does not improve decoding accuracy compared to single-subject approach.
- Score: 38.5686683941366
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate optimal strategies for decoding perceived natural speech from fMRI data acquired from a limited number of participants. Leveraging Lebel et al. (2023)'s dataset of 8 participants, we first demonstrate the effectiveness of training deep neural networks to predict LLM-derived text representations from fMRI activity. Then, in this data regime, we observe that multi-subject training does not improve decoding accuracy compared to single-subject approach. Furthermore, training on similar or different stimuli across subjects has a negligible effect on decoding accuracy. Finally, we find that our decoders better model syntactic than semantic features, and that stories containing sentences with complex syntax or rich semantic content are more challenging to decode. While our results demonstrate the benefits of having extensive data per participant (deep phenotyping), they suggest that leveraging multi-subject for natural speech decoding likely requires deeper phenotyping or a substantially larger cohort.
Related papers
- Decoding individual words from non-invasive brain recordings across 723 participants [9.9068852821927]
We introduce a novel deep learning pipeline to decode individual words from non-invasive electro- (EEG) and magneto-encephalography (MEG) signals.<n>We train and evaluate our approach on an unprecedentedly large number of participants exposed to five million words either written or spoken in English, French or Dutch.
arXiv Detail & Related papers (2024-12-11T15:53:49Z) - A multimodal LLM for the non-invasive decoding of spoken text from brain recordings [0.4187344935012482]
We propose and end-to-end multimodal LLM for decoding spoken text from fMRI signals.
The proposed architecture is founded on (i) an encoder derived from a specific transformer incorporating an augmented embedding layer for the encoder and a better-adjusted attention mechanism than that present in the state of the art.
A benchmark in performed on a corpus consisting of a set of interactions human-human and human-robot interactions where fMRI and conversational signals are recorded synchronously.
arXiv Detail & Related papers (2024-09-29T14:03:39Z) - LLM4Brain: Training a Large Language Model for Brain Video Understanding [9.294352205183726]
We introduce an LLM-based approach for reconstructing visual-semantic information from fMRI signals elicited by video stimuli.
We employ fine-tuning techniques on an fMRI encoder equipped with adaptors to transform brain responses into latent representations aligned with the video stimuli.
In particular, we integrate self-supervised domain adaptation methods to enhance the alignment between visual-semantic information and brain responses.
arXiv Detail & Related papers (2024-09-26T15:57:08Z) - Across-subject ensemble-learning alleviates the need for large samples for fMRI decoding [37.41192511246204]
Within-subject decoding avoids between-subject correspondence problems but requires large sample sizes to make accurate predictions.
Here, we investigate an ensemble approach to decoding that combines the classifiers trained on data from other subjects to decode cognitive states in a new subject.
We find that it outperforms the conventional decoding approach by up to 20% in accuracy, especially for datasets with limited per-subject data.
arXiv Detail & Related papers (2024-07-09T08:22:44Z) - CLIP-MUSED: CLIP-Guided Multi-Subject Visual Neural Information Semantic
Decoding [14.484475792279671]
We propose a CLIP-guided Multi-sUbject visual neural information SEmantic Decoding (CLIP-MUSED) method.
Our method consists of a Transformer-based feature extractor to effectively model global neural representations.
It also incorporates learnable subject-specific tokens that facilitates the aggregation of multi-subject data.
arXiv Detail & Related papers (2024-02-14T07:41:48Z) - An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians.
Recent studies have achieved promising results in automatic impression generation using large-scale medical text data.
These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - Deep Neural Convolutive Matrix Factorization for Articulatory
Representation Decomposition [48.56414496900755]
This work uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores.
Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully.
arXiv Detail & Related papers (2022-04-01T14:25:19Z) - Deep Recurrent Encoder: A scalable end-to-end network to model brain
signals [122.1055193683784]
We propose an end-to-end deep learning architecture trained to predict the brain responses of multiple subjects at once.
We successfully test this approach on a large cohort of magnetoencephalography (MEG) recordings acquired during a one-hour reading task.
arXiv Detail & Related papers (2021-03-03T11:39:17Z) - Deep Representational Similarity Learning for analyzing neural
signatures in task-based fMRI dataset [81.02949933048332]
This paper develops Deep Representational Similarity Learning (DRSL), a deep extension of Representational Similarity Analysis (RSA)
DRSL is appropriate for analyzing similarities between various cognitive tasks in fMRI datasets with a large number of subjects.
arXiv Detail & Related papers (2020-09-28T18:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.