When the LM misunderstood the human chuckled: Analyzing garden path effects in humans and language models
- URL: http://arxiv.org/abs/2502.09307v1
- Date: Thu, 13 Feb 2025 13:19:33 GMT
- Title: When the LM misunderstood the human chuckled: Analyzing garden path effects in humans and language models
- Authors: Samuel Joseph Amouyal, Aya Meltzer-Asscher, Jonathan Berant,
- Abstract summary: Modern Large Language Models (LLMs) have shown human-like abilities in many language tasks.<n>We compare the two on a sentence comprehension task using garden-path constructions.<n>Our findings reveal that both LLMs and humans struggle with specific syntactic complexities.
- Score: 41.929897900569905
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern Large Language Models (LLMs) have shown human-like abilities in many language tasks, sparking interest in comparing LLMs' and humans' language processing. In this paper, we conduct a detailed comparison of the two on a sentence comprehension task using garden-path constructions, which are notoriously challenging for humans. Based on psycholinguistic research, we formulate hypotheses on why garden-path sentences are hard, and test these hypotheses on human participants and a large suite of LLMs using comprehension questions. Our findings reveal that both LLMs and humans struggle with specific syntactic complexities, with some models showing high correlation with human comprehension. To complement our findings, we test LLM comprehension of garden-path constructions with paraphrasing and text-to-image generation tasks, and find that the results mirror the sentence comprehension question results, further validating our findings on LLM understanding of these constructions.
Related papers
- Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English [66.97110551643722]
We investigate dialectal disparities in Large Language Models (LLMs) reasoning tasks.
We find that LLMs produce less accurate responses and simpler reasoning chains and explanations for AAE inputs.
These findings highlight systematic differences in how LLMs process and reason about different language varieties.
arXiv Detail & Related papers (2025-03-06T05:15:34Z) - How Deep is Love in LLMs' Hearts? Exploring Semantic Size in Human-like Cognition [75.11808682808065]
This study investigates whether large language models (LLMs) exhibit similar tendencies in understanding semantic size.
Our findings reveal that multi-modal training is crucial for LLMs to achieve more human-like understanding.
Lastly, we examine whether LLMs are influenced by attention-grabbing headlines with larger semantic sizes in a real-world web shopping scenario.
arXiv Detail & Related papers (2025-03-01T03:35:56Z) - Child vs. machine language learning: Can the logical structure of human language unleash LLMs? [0.0]
We argue that human language learning proceeds in a manner that is different in nature from current approaches to training LLMs.
We present evidence from German plural formation by LLMs that confirm our hypothesis that even very powerful implementations produce results that miss aspects of the logic inherent to language that humans have no problem with.
arXiv Detail & Related papers (2025-02-24T16:40:46Z) - Non-literal Understanding of Number Words by Language Models [33.24263583093367]
Humans naturally interpret numbers non-literally, combining context, world knowledge, and speaker intent.<n>We investigate whether large language models (LLMs) interpret numbers similarly, focusing on hyperbole and pragmatic halo effects.
arXiv Detail & Related papers (2025-02-10T07:03:00Z) - HLB: Benchmarking LLMs' Humanlikeness in Language Use [2.438748974410787]
We present a comprehensive humanlikeness benchmark (HLB) evaluating 20 large language models (LLMs)
We collected responses from over 2,000 human participants and compared them to outputs from the LLMs in these experiments.
Our results reveal fine-grained differences in how well LLMs replicate human responses across various linguistic levels.
arXiv Detail & Related papers (2024-09-24T09:02:28Z) - When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models [59.84769254832941]
We propose a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp.
Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment.
Based on FLUB, we investigate the performance of multiple representative and advanced LLMs.
arXiv Detail & Related papers (2024-02-16T22:12:53Z) - Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? [140.9751389452011]
We study the biases of large language models (LLMs) in relation to those known in children when solving arithmetic word problems.
We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features.
arXiv Detail & Related papers (2024-01-31T18:48:20Z) - How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering [52.86931192259096]
Knowledge Base Question Answering (KBQA) aims to answer natural language questions based on facts in knowledge bases.
Recent works leverage the capabilities of large language models (LLMs) for logical form generation to improve performance.
arXiv Detail & Related papers (2024-01-11T09:27:50Z) - Divergences between Language Models and Human Brains [59.100552839650774]
We systematically explore the divergences between human and machine language processing.<n>We identify two domains that LMs do not capture well: social/emotional intelligence and physical commonsense.<n>Our results show that fine-tuning LMs on these domains can improve their alignment with human brain responses.
arXiv Detail & Related papers (2023-11-15T19:02:40Z) - Testing AI on language comprehension tasks reveals insensitivity to underlying meaning [3.335047764053173]
Large Language Models (LLMs) are recruited in applications that span from clinical assistance and legal support to question answering and education.
Yet, reverse-engineering is bound by Moravec's Paradox, according to which easy skills are hard.
We systematically assess 7 state-of-the-art models on a novel benchmark.
arXiv Detail & Related papers (2023-02-23T20:18:52Z) - Context Limitations Make Neural Language Models More Human-Like [32.488137777336036]
We show discrepancies in context access between modern neural language models (LMs) and humans in incremental sentence processing.
Additional context limitation was needed to make LMs better simulate human reading behavior.
Our analyses also showed that human-LM gaps in memory access are associated with specific syntactic constructions.
arXiv Detail & Related papers (2022-05-23T17:01:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.