A Psycholinguistic Evaluation of Language Models' Sensitivity to Argument Roles
- URL: http://arxiv.org/abs/2410.16139v1
- Date: Mon, 21 Oct 2024 16:05:58 GMT
- Title: A Psycholinguistic Evaluation of Language Models' Sensitivity to Argument Roles
- Authors: Eun-Kyoung Rosa Lee, Sathvik Nair, Naomi Feldman,
- Abstract summary: We evaluate large language models' sensitivity to argument roles by replicating psycholinguistic studies on human argument role processing.
We find that language models are able to distinguish verbs that appear in plausible and implausible contexts, where plausibility is determined through the relation between the verb and its preceding arguments.
This indicates that language models' capacity to detect verb plausibility does not arise from the same mechanism that underlies human real-time sentence processing.
- Score: 0.06554326244334868
- License:
- Abstract: We present a systematic evaluation of large language models' sensitivity to argument roles, i.e., who did what to whom, by replicating psycholinguistic studies on human argument role processing. In three experiments, we find that language models are able to distinguish verbs that appear in plausible and implausible contexts, where plausibility is determined through the relation between the verb and its preceding arguments. However, none of the models capture the same selective patterns that human comprehenders exhibit during real-time verb prediction. This indicates that language models' capacity to detect verb plausibility does not arise from the same mechanism that underlies human real-time sentence processing.
Related papers
- Perceptions of Linguistic Uncertainty by Language Models and Humans [26.69714008538173]
We investigate how language models map linguistic expressions of uncertainty to numerical responses.
We find that 7 out of 10 models are able to map uncertainty expressions to probabilistic responses in a human-like manner.
This sensitivity indicates that language models are substantially more susceptible to bias based on their prior knowledge.
arXiv Detail & Related papers (2024-07-22T17:26:12Z) - UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations.
Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation.
We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z) - A fine-grained comparison of pragmatic language understanding in humans
and language models [2.231167375820083]
We compare language models and humans on seven pragmatic phenomena.
We find that the largest models achieve high accuracy and match human error patterns.
Preliminary evidence that models and humans are sensitive to similar linguistic cues.
arXiv Detail & Related papers (2022-12-13T18:34:59Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Testing the Ability of Language Models to Interpret Figurative Language [69.59943454934799]
Figurative and metaphorical language are commonplace in discourse.
It remains an open question to what extent modern language models can interpret nonliteral phrases.
We introduce Fig-QA, a Winograd-style nonliteral language understanding task.
arXiv Detail & Related papers (2022-04-26T23:42:22Z) - Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics.
Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding.
We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z) - It's not Rocket Science : Interpreting Figurative Language in Narratives [48.84507467131819]
We study the interpretation of two non-compositional figurative languages (idioms and similes)
Our experiments show that models based solely on pre-trained language models perform substantially worse than humans on these tasks.
We additionally propose knowledge-enhanced models, adopting human strategies for interpreting figurative language.
arXiv Detail & Related papers (2021-08-31T21:46:35Z) - Uncovering Constraint-Based Behavior in Neural Models via Targeted
Fine-Tuning [9.391375268580806]
We show that competing linguistic processes within a language obscure underlying linguistic knowledge.
While human behavior has been found to be similar across languages, we find cross-linguistic variation in model behavior.
Our results suggest that models need to learn both the linguistic constraints in a language and their relative ranking, with mismatches in either producing non-human-like behavior.
arXiv Detail & Related papers (2021-06-02T14:52:11Z) - Evaluating Models of Robust Word Recognition with Serial Reproduction [8.17947290421835]
We compare several broad-coverage probabilistic generative language models in their ability to capture human linguistic expectations.
We find that those models that make use of abstract representations of preceding linguistic context best predict the changes made by people in the course of serial reproduction.
arXiv Detail & Related papers (2021-01-24T20:16:12Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.