Meaning Beyond Truth Conditions: Evaluating Discourse Level Understanding via Anaphora Accessibility
- URL: http://arxiv.org/abs/2502.14119v1
- Date: Wed, 19 Feb 2025 21:45:26 GMT
- Title: Meaning Beyond Truth Conditions: Evaluating Discourse Level Understanding via Anaphora Accessibility
- Authors: Xiaomeng Zhu, Zhenghao Zhou, Simon Charlow, Robert Frank,
- Abstract summary: We present a hierarchy of natural language understanding abilities.<n>We argue for the importance of moving beyond assessments of understanding at the lexical and sentence levels to the discourse level.
- Score: 1.7985432767595741
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a hierarchy of natural language understanding abilities and argue for the importance of moving beyond assessments of understanding at the lexical and sentence levels to the discourse level. We propose the task of anaphora accessibility as a diagnostic for assessing discourse understanding, and to this end, present an evaluation dataset inspired by theoretical research in dynamic semantics. We evaluate human and LLM performance on our dataset and find that LLMs and humans align on some tasks and diverge on others. Such divergence can be explained by LLMs' reliance on specific lexical items during language comprehension, in contrast to human sensitivity to structural abstractions.
Related papers
- Reasoning Beyond Labels: Measuring LLM Sentiment in Low-Resource, Culturally Nuanced Contexts [10.492471013369782]
We present a framework that treats sentiment as a context-dependent, culturally embedded construct.<n>We evaluate how large language models (LLMs) reason about sentiment in WhatsApp messages from Nairobi youth health groups.
arXiv Detail & Related papers (2025-08-06T08:27:55Z) - How LLMs Comprehend Temporal Meaning in Narratives: A Case Study in Cognitive Evaluation of LLMs [13.822169295436177]
We investigate how large language models (LLMs) process the temporal meaning of linguistic aspect in narratives that were previously used in human studies.<n>Our findings show that LLMs over-rely on prototypicality, produce inconsistent aspectual judgments, and struggle with causal reasoning derived from aspect.<n>These results suggest that LLMs process aspect fundamentally differently from humans and lack robust narrative understanding.
arXiv Detail & Related papers (2025-07-18T18:28:35Z) - Beyond Keywords: Evaluating Large Language Model Classification of Nuanced Ableism [2.0435202333125977]
Large language models (LLMs) are increasingly used in decision-making tasks like r'esum'e screening and content moderation.<n>We evaluate the ability of four LLMs to identify nuanced ableism directed at autistic individuals.<n>Our results reveal that LLMs can identify autism-related language but often miss harmful or offensive connotations.
arXiv Detail & Related papers (2025-05-26T20:01:44Z) - Semantic Mastery: Enhancing LLMs with Advanced Natural Language Understanding [0.0]
The paper discusses state-of-the-art methodologies that advance large language models (LLMs) with more advanced NLU techniques.
We analyze the use of structured knowledge graphs, retrieval-augmented generation (RAG), and fine-tuning strategies that match models with human-level understanding.
arXiv Detail & Related papers (2025-04-01T04:12:04Z) - Learning from Impairment: Leveraging Insights from Clinical Linguistics in Language Modelling Research [1.544681800932596]
This position paper investigates the potential of integrating insights from language impairment research and its clinical treatment to develop human-inspired learning strategies and evaluation frameworks for language models (LMs)<n>We inspect the theoretical underpinnings underlying some influential linguistically motivated training approaches derived from neurolinguistics and, particularly, aphasiology, aimed at enhancing the recovery and generalization of linguistic skills in aphasia treatment.<n>We highlight how these insights can inform the design of rigorous assessments for LMs, specifically in their handling of complex syntactic phenomena, as well as their implications for developing human-like learning strategies.
arXiv Detail & Related papers (2024-12-20T10:53:21Z) - Large Language Models as Neurolinguistic Subjects: Discrepancy in Performance and Competence for Form and Meaning [49.60849499134362]
This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning)
We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers.
We found: (1) Psycholinguistic and neurolinguistic methods reveal that language performance and competence are distinct; (2) Direct probability measurement may not accurately assess linguistic competence; and (3) Instruction tuning won't change much competence but improve performance.
arXiv Detail & Related papers (2024-11-12T04:16:44Z) - Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism [62.571419297164645]
This paper provides a systematic overview of prior works on the logical reasoning ability of large language models for analyzing categorical syllogisms.<n>We first investigate all the possible variations for the categorical syllogisms from a purely logical perspective.<n>We then examine the underlying configurations (i.e., mood and figure) tested by the existing datasets.
arXiv Detail & Related papers (2024-06-26T21:17:20Z) - A Survey on Lexical Ambiguity Detection and Word Sense Disambiguation [0.0]
This paper explores techniques that focus on understanding and resolving ambiguity in language within the field of natural language processing (NLP)
It outlines diverse approaches ranging from deep learning techniques to leveraging lexical resources and knowledge graphs like WordNet.
The research identifies persistent challenges in the field, such as the scarcity of sense annotated corpora and the complexity of informal clinical texts.
arXiv Detail & Related papers (2024-03-24T12:58:48Z) - Identifying Semantic Induction Heads to Understand In-Context Learning [103.00463655766066]
We investigate whether attention heads encode two types of relationships between tokens present in natural languages.
We find that certain attention heads exhibit a pattern where, when attending to head tokens, they recall tail tokens and increase the output logits of those tail tokens.
arXiv Detail & Related papers (2024-02-20T14:43:39Z) - Can Large Language Models Understand Context? [17.196362853457412]
This paper introduces a context understanding benchmark by adapting existing datasets to suit the evaluation of generative models.
Experimental results indicate that pre-trained dense models struggle with understanding more nuanced contextual features when compared to state-of-the-art fine-tuned models.
As LLM compression holds growing significance in both research and real-world applications, we assess the context understanding of quantized models under in-context-learning settings.
arXiv Detail & Related papers (2024-02-01T18:55:29Z) - Think from Words(TFW): Initiating Human-Like Cognition in Large Language
Models Through Think from Words for Japanese Text-level Classification [0.0]
"Think from Words" (TFW) initiates the comprehension process at the word level and then extends it to encompass the entire text.
"TFW with Extra word-level information" (TFW Extra) augmenting comprehension with additional word-level data.
Our findings shed light on the impact of various word-level information types on LLMs' text comprehension.
arXiv Detail & Related papers (2023-12-06T12:34:46Z) - MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language
Models to Generalize to Novel Interpretations [37.13707912132472]
Humans possess a remarkable ability to assign novel interpretations to linguistic expressions.
Large Language Models (LLMs) have a knowledge cutoff and are costly to finetune repeatedly.
We systematically analyse the ability of LLMs to acquire novel interpretations using in-context learning.
arXiv Detail & Related papers (2023-10-18T00:02:38Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Prompting Language Models for Linguistic Structure [73.11488464916668]
We present a structured prompting approach for linguistic structured prediction tasks.
We evaluate this approach on part-of-speech tagging, named entity recognition, and sentence chunking.
We find that while PLMs contain significant prior knowledge of task labels due to task leakage into the pretraining corpus, structured prompting can also retrieve linguistic structure with arbitrary labels.
arXiv Detail & Related papers (2022-11-15T01:13:39Z) - ERICA: Improving Entity and Relation Understanding for Pre-trained
Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text.
Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.