BrainWavLM: Fine-tuning Speech Representations with Brain Responses to Language
- URL: http://arxiv.org/abs/2502.08866v1
- Date: Thu, 13 Feb 2025 00:37:27 GMT
- Title: BrainWavLM: Fine-tuning Speech Representations with Brain Responses to Language
- Authors: Nishitha Vattikonda, Aditya R. Vaidya, Richard J. Antonello, Alexander G. Huth,
- Abstract summary: Speech encoding models use auditory representations to predict how the human brain responds to spoken language stimuli.
In this work, we use low-rank adaptation (LoRA) to fine-tune a WavLM-based encoding model end-to-end on a brain encoding objective.
We show that fine-tuning across all of cortex improves average encoding performance with greater stability than without LoRA.
- Score: 43.53912137735093
- License:
- Abstract: Speech encoding models use auditory representations to predict how the human brain responds to spoken language stimuli. Most performant encoding models linearly map the hidden states of artificial neural networks to brain data, but this linear restriction may limit their effectiveness. In this work, we use low-rank adaptation (LoRA) to fine-tune a WavLM-based encoding model end-to-end on a brain encoding objective, producing a model we name BrainWavLM. We show that fine-tuning across all of cortex improves average encoding performance with greater stability than without LoRA. This improvement comes at the expense of low-level regions like auditory cortex (AC), but selectively fine-tuning on these areas improves performance in AC, while largely retaining gains made in the rest of cortex. Fine-tuned models generalized across subjects, indicating that they learned robust brain-like representations of the speech stimuli. Finally, by training linear probes, we showed that the brain data strengthened semantic representations in the speech model without any explicit annotations. Our results demonstrate that brain fine-tuning produces best-in-class speech encoding models, and that non-linear methods have the potential to bridge the gap between artificial and biological representations of semantics.
Related papers
- Improving semantic understanding in speech language models via brain-tuning [19.732593005537606]
Speech language models align with human brain responses to natural language to an impressive degree.
Current models rely heavily on low-level speech features, indicating they lack brain-relevant semantics.
We address this limitation by inducing brain-relevant bias directly into the models via fine-tuning with fMRI recordings.
arXiv Detail & Related papers (2024-10-11T20:06:21Z) - Language Reconstruction with Brain Predictive Coding from fMRI Data [28.217967547268216]
Theory of predictive coding suggests that human brain naturally engages in continuously predicting future word representations.
textscPredFT achieves current state-of-the-art decoding performance with a maximum BLEU-1 score of $27.8%$.
arXiv Detail & Related papers (2024-05-19T16:06:02Z) - Du-IN: Discrete units-guided mask modeling for decoding speech from Intracranial Neural signals [5.283718601431859]
Invasive brain-computer interfaces with Electrocorticography (ECoG) have shown promise for high-performance speech decoding in medical applications.
We developed the Du-IN model, which extracts contextual embeddings based on region-level tokens through discrete codex-guided mask modeling.
Our model achieves state-of-the-art performance on the 61-word classification task, surpassing all baselines.
arXiv Detail & Related papers (2024-05-19T06:00:36Z) - SpeechAlign: Aligning Speech Generation to Human Preferences [51.684183257809075]
We introduce SpeechAlign, an iterative self-improvement strategy that aligns speech language models to human preferences.
We show that SpeechAlign can bridge the distribution gap and facilitate continuous self-improvement of the speech language model.
arXiv Detail & Related papers (2024-04-08T15:21:17Z) - Scaling laws for language encoding models in fMRI [47.498241053872924]
We tested whether larger open-source models are better at predicting brain responses recorded using fMRI.
Similar logarithmic behavior was observed when scaling the size of the fMRI training set.
These results suggest that increasing scale in both models and data will yield incredibly effective models of language processing in the brain.
arXiv Detail & Related papers (2023-05-19T17:53:03Z) - BrainBERT: Self-supervised representation learning for intracranial
recordings [18.52962864519609]
We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience.
Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, with higher accuracy and with much less data.
In the future, far more concepts will be decodable from neural recordings by using representation learning, potentially unlocking the brain like language models unlocked language.
arXiv Detail & Related papers (2023-02-28T07:40:37Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - Neural Language Models are not Born Equal to Fit Brain Data, but
Training Helps [75.84770193489639]
We examine the impact of test loss, training corpus and model architecture on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook.
We find that untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words.
We suggest good practices for future studies aiming at explaining the human language system using neural language models.
arXiv Detail & Related papers (2022-07-07T15:37:17Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.