Brain-tuned Speech Models Better Reflect Speech Processing Stages in the Brain
- URL: http://arxiv.org/abs/2506.03832v1
- Date: Wed, 04 Jun 2025 10:59:11 GMT
- Title: Brain-tuned Speech Models Better Reflect Speech Processing Stages in the Brain
- Authors: Omer Moussa, Mariya Toneva,
- Abstract summary: Self-supervised speech models excel in speech tasks but do not reflect the hierarchy of human speech processing.<n>Recent work showed that brain-tuning models using human brain recordings improves speech models' semantic understanding.<n>We find that late layers of brain-tuned models substantially improve over pretrained models in their alignment with semantic language regions.
- Score: 4.652236080354487
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained self-supervised speech models excel in speech tasks but do not reflect the hierarchy of human speech processing, as they encode rich semantics in middle layers and poor semantics in late layers. Recent work showed that brain-tuning (fine-tuning models using human brain recordings) improves speech models' semantic understanding. Here, we examine how well brain-tuned models further reflect the brain's intermediate stages of speech processing. We find that late layers of brain-tuned models substantially improve over pretrained models in their alignment with semantic language regions. Further layer-wise probing reveals that early layers remain dedicated to low-level acoustic features, while late layers become the best at complex high-level tasks. These findings show that brain-tuned models not only perform better but also exhibit a well-defined hierarchical processing going from acoustic to semantic representations, making them better model organisms for human speech processing.
Related papers
- Do Large Language Models Think Like the Brain? Sentence-Level Evidence from fMRI and Hierarchical Embeddings [28.210559128941593]
This study investigates how hierarchical representations in large language models align with the dynamic neural responses during human sentence comprehension.<n>Results show that improvements in model performance drive the evolution of representational architectures toward brain-like hierarchies.
arXiv Detail & Related papers (2025-05-28T16:40:06Z) - BrainWavLM: Fine-tuning Speech Representations with Brain Responses to Language [43.53912137735093]
Speech encoding models use auditory representations to predict how the human brain responds to spoken language stimuli.<n>In this work, we use low-rank adaptation (LoRA) to fine-tune a WavLM-based encoding model end-to-end on a brain encoding objective.<n>We show that fine-tuning across all of cortex improves average encoding performance with greater stability than without LoRA.
arXiv Detail & Related papers (2025-02-13T00:37:27Z) - Improving Semantic Understanding in Speech Language Models via Brain-tuning [19.732593005537606]
Speech language models align with human brain responses to natural language to an impressive degree.<n>Current models rely heavily on low-level speech features, indicating they lack brain-relevant semantics.<n>We address this limitation by inducing brain-relevant bias directly into the models via fine-tuning with fMRI recordings.
arXiv Detail & Related papers (2024-10-11T20:06:21Z) - Neural Language Models are not Born Equal to Fit Brain Data, but
Training Helps [75.84770193489639]
We examine the impact of test loss, training corpus and model architecture on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook.
We find that untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words.
We suggest good practices for future studies aiming at explaining the human language system using neural language models.
arXiv Detail & Related papers (2022-07-07T15:37:17Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z) - Self-Supervised Learning for speech recognition with Intermediate layer
supervision [52.93758711230248]
We propose Intermediate Layer Supervision for Self-Supervised Learning (ILS-SSL)
ILS-SSL forces the model to concentrate on content information as much as possible by adding an additional SSL loss on the intermediate layers.
Experiments on LibriSpeech test-other set show that our method outperforms HuBERT significantly.
arXiv Detail & Related papers (2021-12-16T10:45:05Z) - Model-based analysis of brain activity reveals the hierarchy of language
in 305 subjects [82.81964713263483]
A popular approach to decompose the neural bases of language consists in correlating, across individuals, the brain responses to different stimuli.
Here, we show that a model-based approach can reach equivalent results within subjects exposed to natural stimuli.
arXiv Detail & Related papers (2021-10-12T15:30:21Z) - Inductive biases, pretraining and fine-tuning jointly account for brain
responses to speech [6.87854783185243]
We compare five types of deep neural networks to human brain responses elicited by spoken sentences.
The differences in brain-similarity across networks revealed three main results.
arXiv Detail & Related papers (2021-02-25T19:11:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.