Scaling laws for language encoding models in fMRI
- URL: http://arxiv.org/abs/2305.11863v4
- Date: Tue, 30 Jan 2024 18:31:45 GMT
- Title: Scaling laws for language encoding models in fMRI
- Authors: Richard Antonello, Aditya Vaidya, and Alexander G. Huth
- Abstract summary: We tested whether larger open-source models are better at predicting brain responses recorded using fMRI.
Similar logarithmic behavior was observed when scaling the size of the fMRI training set.
These results suggest that increasing scale in both models and data will yield incredibly effective models of language processing in the brain.
- Score: 47.498241053872924
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Representations from transformer-based unidirectional language models are
known to be effective at predicting brain responses to natural language.
However, most studies comparing language models to brains have used GPT-2 or
similarly sized language models. Here we tested whether larger open-source
models such as those from the OPT and LLaMA families are better at predicting
brain responses recorded using fMRI. Mirroring scaling results from other
contexts, we found that brain prediction performance scales logarithmically
with model size from 125M to 30B parameter models, with ~15% increased encoding
performance as measured by correlation with a held-out test set across 3
subjects. Similar logarithmic behavior was observed when scaling the size of
the fMRI training set. We also characterized scaling for acoustic encoding
models that use HuBERT, WavLM, and Whisper, and we found comparable
improvements with model size. A noise ceiling analysis of these large,
high-performance encoding models showed that performance is nearing the
theoretical maximum for brain areas such as the precuneus and higher auditory
cortex. These results suggest that increasing scale in both models and data
will yield incredibly effective models of language processing in the brain,
enabling better scientific understanding as well as applications such as
decoding.
Related papers
- BrainWavLM: Fine-tuning Speech Representations with Brain Responses to Language [43.53912137735093]
Speech encoding models use auditory representations to predict how the human brain responds to spoken language stimuli.
In this work, we use low-rank adaptation (LoRA) to fine-tune a WavLM-based encoding model end-to-end on a brain encoding objective.
We show that fine-tuning across all of cortex improves average encoding performance with greater stability than without LoRA.
arXiv Detail & Related papers (2025-02-13T00:37:27Z) - Interpretable Language Modeling via Induction-head Ngram Models [74.26720927767398]
We propose Induction-head ngram models (Induction-Gram) to bolster modern ngram models with a hand-engineered "induction head"
This induction head uses a custom neural similarity metric to efficiently search the model's input context for potential next-word completions.
Experiments show that this simple method significantly improves next-word prediction over baseline interpretable models.
arXiv Detail & Related papers (2024-10-31T12:33:26Z) - Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network [16.317199232071232]
Large Language Models (LLMs) have been shown to be effective models of the human language system.
In this work, we investigate the key architectural components driving the surprising alignment of untrained models.
arXiv Detail & Related papers (2024-06-21T12:54:03Z) - fMRI predictors based on language models of increasing complexity recover brain left lateralization [4.1618731507412505]
We show that the left-right difference in brain correlation follows a scaling law with the number of parameters.
This finding reconciles computational analyses of brain activity using large language models with the classic observation from aphasic patients showing left hemisphere dominance for language.
arXiv Detail & Related papers (2024-05-28T09:24:52Z) - Applicability of scaling laws to vision encoding models [0.7734726150561089]
We investigated how to build a high-performance vision encoding model to predict brain activity as part of our participation in the Algonauts Project 2023 Challenge.
The challenge provided brain activity recorded by functional MRI (fMRI) while participants viewed images.
Several vision models with parameter sizes ranging from 86M to 4.3B were used to build predictive models.
arXiv Detail & Related papers (2023-08-01T17:31:14Z) - Neural Language Models are not Born Equal to Fit Brain Data, but
Training Helps [75.84770193489639]
We examine the impact of test loss, training corpus and model architecture on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook.
We find that untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words.
We suggest good practices for future studies aiming at explaining the human language system using neural language models.
arXiv Detail & Related papers (2022-07-07T15:37:17Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z) - Scaling Language Models: Methods, Analysis & Insights from Training
Gopher [83.98181046650664]
We present an analysis of Transformer-based language model performance across a wide range of model scales.
Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language.
We discuss the application of language models to AI safety and the mitigation of downstream harms.
arXiv Detail & Related papers (2021-12-08T19:41:47Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.