Related papers: Can training neural language models on a curriculum with developmentally plausible data improve alignment with human reading behavior?

Can training neural language models on a curriculum with developmentally plausible data improve alignment with human reading behavior?

URL: http://arxiv.org/abs/2311.18761v1
Date: Thu, 30 Nov 2023 18:03:58 GMT
Title: Can training neural language models on a curriculum with developmentally plausible data improve alignment with human reading behavior?
Authors: Aryaman Chobey, Oliver Smith, Anzi Wang, Grusha Prasad
Abstract summary: This paper explores the extent to which the misalignment between empirical and model-predicted behavior can be minimized by training models on more developmentally plausible data. We trained teacher language models on the BabyLM "strict-small" dataset and used sentence level surprisal estimates from these teacher models to create a curriculum. We found tentative evidence that our curriculum made it easier for models to acquire linguistic knowledge from the training data.
Score: 0.2745342790938508
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The use of neural language models to model human behavior has met with mixed success. While some work has found that the surprisal estimates from these models can be used to predict a wide range of human neural and behavioral responses, other work studying more complex syntactic phenomena has found that these surprisal estimates generate incorrect behavioral predictions. This paper explores the extent to which the misalignment between empirical and model-predicted behavior can be minimized by training models on more developmentally plausible data, such as in the BabyLM Challenge. We trained teacher language models on the BabyLM "strict-small" dataset and used sentence level surprisal estimates from these teacher models to create a curriculum. We found tentative evidence that our curriculum made it easier for models to acquire linguistic knowledge from the training data: on the subset of tasks in the BabyLM challenge suite evaluating models' grammatical knowledge of English, models first trained on the BabyLM data curriculum and then on a few randomly ordered training epochs performed slightly better than models trained on randomly ordered epochs alone. This improved linguistic knowledge acquisition did not result in better alignment with human reading behavior, however: models trained on the BabyLM dataset (with or without a curriculum) generated predictions that were as misaligned with human behavior as models trained on larger less curated datasets. This suggests that training on developmentally plausible datasets alone is likely insufficient to generate language models capable of accurately predicting human language processing.

Related papers

Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora [84.03928547166873]
Children can acquire language from less than 100 million words of input. Large language models are far less data-efficient: they typically require 3 or 4 orders of magnitude more data and still do not perform as well as humans on many evaluations. The BabyLM Challenge is a communal effort in which participants compete to optimize language model training on a fixed data budget.
arXiv Detail & Related papers (2025-04-10T23:22:43Z)
Is Child-Directed Speech Effective Training Data for Language Models? [34.46268640655943]
We train GPT-2 and RoBERTa models on 29M words of English child-directed speech. We test whether the global developmental ordering or the local discourse ordering of children's training data supports high performance relative to other datasets. These findings support the hypothesis that, rather than proceeding from better data, the child's learning algorithm is substantially more data-efficient than current language modeling techniques.
arXiv Detail & Related papers (2024-08-07T08:18:51Z)
DevBench: A multimodal developmental benchmark for language learning [0.34129029452670606]
We introduce DevBench, a benchmark for evaluating vision-language models on tasks and behavioral data. We show that DevBench provides a benchmark for comparing models to human language development. These comparisons highlight ways in which model and human language learning processes diverge.
arXiv Detail & Related papers (2024-06-14T17:49:41Z)
Generative Pre-training for Speech with Flow Matching [81.59952572752248]
We pre-trained a generative model, named SpeechFlow, on 60k hours of untranscribed speech with Flow Matching and masked conditions. Experiment results show the pre-trained generative model can be fine-tuned with task-specific data to match or surpass existing expert models on speech enhancement, separation, and synthesis.
arXiv Detail & Related papers (2023-10-25T03:40:50Z)
Transformer-Based Language Model Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens [17.80735287413141]
We evaluate surprisal estimates from Transformer-based language model variants on their ability to predict human reading times. Results show that surprisal estimates from most variants with contemporary model capacities provide the best fit after seeing about two billion training tokens. Newly-trained smaller model variants reveal a 'tipping point' at convergence, after which the decrease in language model perplexity begins to result in poorer fits to human reading times.
arXiv Detail & Related papers (2023-04-22T12:50:49Z)
Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity. We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model. By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z)
Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains. How do language models of different sizes learn during pre-training? Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z)
What do Large Language Models Learn beyond Language? [10.9650651784511]
We find that pretrained models significantly outperform comparable non-pretrained neural models. Experiments surprisingly reveal that the positive effects of pre-training persist even when pretraining on multi-lingual text or computer code. Our findings suggest a hitherto unexplored deep connection between pre-training and inductive learning abilities of language models.
arXiv Detail & Related papers (2022-10-21T23:43:13Z)
Training Data Leakage Analysis in Language Models [6.843491191969066]
We introduce a methodology that investigates identifying the user content in the training data that could be leaked under a strong and realistic threat model. We propose two metrics to quantify user-level data leakage by measuring a model's ability to produce unique sentence fragments within training data.
arXiv Detail & Related papers (2021-01-14T00:57:32Z)
Pre-Training a Language Model Without Human Language [74.11825654535895]
We study how the intrinsic nature of pre-training data contributes to the fine-tuned downstream performance. We find that models pre-trained on unstructured data beat those trained directly from scratch on downstream tasks. To our great astonishment, we uncover that pre-training on certain non-human language data gives GLUE performance close to performance pre-trained on another non-English language.
arXiv Detail & Related papers (2020-12-22T13:38:06Z)
Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
Pretrained Language Model Embryology: The Birth of ALBERT [68.5801642674541]
We investigate the developmental process from a set of randomly parameters to a totipotent language model. Our results show that ALBERT learns to reconstruct and predict tokens of different parts of speech (POS) in different learning speeds during pretraining. These findings suggest that knowledge of a pretrained model varies during pretraining, and having more pretrain steps does not necessarily provide a model with more comprehensive knowledge.
arXiv Detail & Related papers (2020-10-06T05:15:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.