Related papers: Improving semantic understanding in speech language models via brain-tuning

Improving semantic understanding in speech language models via brain-tuning

URL: http://arxiv.org/abs/2410.09230v2
Date: Tue, 15 Oct 2024 16:39:10 GMT
Title: Improving semantic understanding in speech language models via brain-tuning
Authors: Omer Moussa, Dietrich Klakow, Mariya Toneva,
Abstract summary: Speech language models align with human brain responses to natural language to an impressive degree. Current models rely heavily on low-level speech features, indicating they lack brain-relevant semantics. We address this limitation by inducing brain-relevant bias directly into the models via fine-tuning with fMRI recordings.
Score: 19.732593005537606
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Speech language models align with human brain responses to natural language to an impressive degree. However, current models rely heavily on low-level speech features, indicating they lack brain-relevant semantics which limits their utility as model organisms of semantic processing in the brain. In this work, we address this limitation by inducing brain-relevant bias directly into the models via fine-tuning with fMRI recordings of people listening to natural stories, a process we name brain-tuning. After testing it on 3 different pretrained model families, we show that brain-tuning not only improves overall alignment with new brain recordings in semantic language regions, but also reduces the reliance on low-level speech features for this alignment. Excitingly, we further show that brain-tuning leads to 1) consistent improvements in performance on a range of downstream tasks and 2) a representational space with increased semantic preference. Our results provide converging evidence, for the first time, that incorporating brain signals into the training of language models improves the models' semantic understanding.

Related papers

Brain-tuned Speech Models Better Reflect Speech Processing Stages in the Brain [4.652236080354487]
Self-supervised speech models excel in speech tasks but do not reflect the hierarchy of human speech processing.<n>Recent work showed that brain-tuning models using human brain recordings improves speech models' semantic understanding.<n>We find that late layers of brain-tuned models substantially improve over pretrained models in their alignment with semantic language regions.
arXiv Detail & Related papers (2025-06-04T10:59:11Z)
BrainWavLM: Fine-tuning Speech Representations with Brain Responses to Language [43.53912137735093]
Speech encoding models use auditory representations to predict how the human brain responds to spoken language stimuli. In this work, we use low-rank adaptation (LoRA) to fine-tune a WavLM-based encoding model end-to-end on a brain encoding objective. We show that fine-tuning across all of cortex improves average encoding performance with greater stability than without LoRA.
arXiv Detail & Related papers (2025-02-13T00:37:27Z)
Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network [16.317199232071232]
Large Language Models (LLMs) have been shown to be effective models of the human language system. In this work, we investigate the key architectural components driving the surprising alignment of untrained models.
arXiv Detail & Related papers (2024-06-21T12:54:03Z)
SpeechAlign: Aligning Speech Generation to Human Preferences [51.684183257809075]
We introduce SpeechAlign, an iterative self-improvement strategy that aligns speech language models to human preferences. We show that SpeechAlign can bridge the distribution gap and facilitate continuous self-improvement of the speech language model.
arXiv Detail & Related papers (2024-04-08T15:21:17Z)
Speech language models lack important brain-relevant semantics [6.626540321463248]
Recent work has shown that text-based language models predict both text-evoked and speech-evoked brain activity to an impressive degree. This poses the question of what types of information language models truly predict in the brain.
arXiv Detail & Related papers (2023-11-08T13:11:48Z)
Do self-supervised speech and language models extract similar representations as human brain? [2.390915090736061]
Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception. We evaluate the brain prediction performance of two representative SSL models, Wav2Vec2.0 and GPT-2.
arXiv Detail & Related papers (2023-10-07T01:39:56Z)
Fine-tuned vs. Prompt-tuned Supervised Representations: Which Better Account for Brain Language Representations? [30.495681024162835]
We compare prompt-tuned and fine-tuned representations in neural decoding. We find that a more brain-consistent tuning method yields representations that better correlate with brain data. This indicates that our brain encodes more fine-grained concept information than shallow syntactic information.
arXiv Detail & Related papers (2023-10-03T07:34:30Z)
Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings. Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z)
Neural Language Models are not Born Equal to Fit Brain Data, but Training Helps [75.84770193489639]
We examine the impact of test loss, training corpus and model architecture on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook. We find that untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words. We suggest good practices for future studies aiming at explaining the human language system using neural language models.
arXiv Detail & Related papers (2022-07-07T15:37:17Z)
Toward a realistic model of speech processing in the brain with self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate. We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z)
Self-supervised models of audio effectively explain human cortical responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system. We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z)
Model-based analysis of brain activity reveals the hierarchy of language in 305 subjects [82.81964713263483]
A popular approach to decompose the neural bases of language consists in correlating, across individuals, the brain responses to different stimuli. Here, we show that a model-based approach can reach equivalent results within subjects exposed to natural stimuli.
arXiv Detail & Related papers (2021-10-12T15:30:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.