A Targeted Assessment of Incremental Processing in Neural LanguageModels
and Humans
- URL: http://arxiv.org/abs/2106.03232v2
- Date: Wed, 25 Oct 2023 10:01:19 GMT
- Title: A Targeted Assessment of Incremental Processing in Neural LanguageModels
and Humans
- Authors: Ethan Gotlieb Wilcox, Pranali Vani, Roger P. Levy
- Abstract summary: We present a scaled-up comparison of incremental processing in humans and neural language models.
Data comes from a novel online experimental paradigm called the Interpolated Maze task.
We find that both humans and language models show increased processing difficulty in ungrammatical sentence regions.
- Score: 2.7624021966289605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a targeted, scaled-up comparison of incremental processing in
humans and neural language models by collecting by-word reaction time data for
sixteen different syntactic test suites across a range of structural phenomena.
Human reaction time data comes from a novel online experimental paradigm called
the Interpolated Maze task. We compare human reaction times to by-word
probabilities for four contemporary language models, with different
architectures and trained on a range of data set sizes. We find that across
many phenomena, both humans and language models show increased processing
difficulty in ungrammatical sentence regions with human and model `accuracy'
scores (a la Marvin and Linzen(2018)) about equal. However, although language
model outputs match humans in direction, we show that models systematically
under-predict the difference in magnitude of incremental processing difficulty
between grammatical and ungrammatical sentences. Specifically, when models
encounter syntactic violations they fail to accurately predict the longer
reaction times observed in the human data. These results call into question
whether contemporary language models are approaching human-like performance for
sensitivity to syntactic violations.
Related papers
- DevBench: A multimodal developmental benchmark for language learning [0.34129029452670606]
We introduce DevBench, a benchmark for evaluating vision-language models on tasks and behavioral data.
We show that DevBench provides a benchmark for comparing models to human language development.
These comparisons highlight ways in which model and human language learning processes diverge.
arXiv Detail & Related papers (2024-06-14T17:49:41Z) - Longer Fixations, More Computation: Gaze-Guided Recurrent Neural
Networks [12.57650361978445]
Humans read texts at a varying pace, while machine learning models treat each token in the same way.
In this paper, we convert this intuition into a set of novel models with fixation-guided parallel RNNs or layers.
We find that, interestingly, the fixation duration predicted by neural networks bears some resemblance to humans' fixation.
arXiv Detail & Related papers (2023-10-31T21:32:11Z) - Visual Grounding Helps Learn Word Meanings in Low-Data Regimes [47.7950860342515]
Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension.
But to achieve these results, LMs must be trained in distinctly un-human-like ways.
Do models trained more naturalistically -- with grounded supervision -- exhibit more humanlike language learning?
We investigate this question in the context of word learning, a key sub-task in language acquisition.
arXiv Detail & Related papers (2023-10-20T03:33:36Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - Multilingual Language Models Predict Human Reading Behavior [8.830621849672108]
We compare the performance of language-specific and multilingual pretrained transformer models to predict reading time measures.
We find that BERT and XLM models successfully predict a range of eye tracking features.
In a series of experiments, we analyze the cross-domain and cross-language abilities of these models and show how they reflect human sentence processing.
arXiv Detail & Related papers (2021-04-12T13:03:49Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - The Sensitivity of Language Models and Humans to Winograd Schema
Perturbations [36.47219885590433]
We show that large-scale pretrained language models are sensitive to linguistic perturbations that minimally affect human understanding.
Our results highlight interesting differences between humans and language models.
arXiv Detail & Related papers (2020-05-04T09:44:54Z) - Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns.
Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.