Related papers: Is It JUST Semantics? A Case Study of Discourse Particle Understanding in LLMs

Is It JUST Semantics? A Case Study of Discourse Particle Understanding in LLMs

URL: http://arxiv.org/abs/2506.04534v1
Date: Thu, 05 Jun 2025 00:59:05 GMT
Title: Is It JUST Semantics? A Case Study of Discourse Particle Understanding in LLMs
Authors: William Sheffield, Kanishka Misra, Valentina Pyatkin, Ashwini Deo, Kyle Mahowald, Junyi Jessy Li,
Abstract summary: This work investigates the capacity of LLMs to distinguish the fine-grained senses of English "just"<n>Our findings reveal that while LLMs exhibit some ability to differentiate between broader categories, they struggle to fully capture more subtle nuances.
Score: 47.462635654670386
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Discourse particles are crucial elements that subtly shape the meaning of text. These words, often polyfunctional, give rise to nuanced and often quite disparate semantic/discourse effects, as exemplified by the diverse uses of the particle "just" (e.g., exclusive, temporal, emphatic). This work investigates the capacity of LLMs to distinguish the fine-grained senses of English "just", a well-studied example in formal semantics, using data meticulously created and labeled by expert linguists. Our findings reveal that while LLMs exhibit some ability to differentiate between broader categories, they struggle to fully capture more subtle nuances, highlighting a gap in their understanding of discourse particles.

Related papers

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth [21.092167028989632]
Drivelology is a linguistic phenomenon characterised by "nonsense with depth"<n>We construct a benchmark dataset of over 1,200+ meticulously curated and diverse examples across English, Mandarin, Spanish, French, Japanese, and Korean.<n>We find that current large language models (LLMs) consistently fail to grasp the layered semantics of Drivelological text.
arXiv Detail & Related papers (2025-09-04T03:58:55Z)
Linguistic Blind Spots of Large Language Models [14.755831733659699]
We study the performance of recent large language models (LLMs) on linguistic annotation tasks.<n>We find that recent LLMs show limited efficacy in addressing linguistic queries and often struggle with linguistically complex inputs.<n>Our results provide insights to inform future advancements in LLM design and development.
arXiv Detail & Related papers (2025-03-25T01:47:13Z)
Multilingual LLMs Struggle to Link Orthography and Semantics in Bilingual Word Processing [19.6191088446367]
This study focuses on English-Spanish, English-French, and English-German cognates, non-cognate, and interlingual homographs.<n>We evaluate how multilingual Large Language Models (LLMs) handle such phenomena, focusing on English-Spanish, English-French, and English-German cognates, non-cognate, and interlingual homographs.<n>We find models to opt for different strategies in understanding English and non-English homographs, highlighting a lack of a unified approach to handling cross-lingual ambiguities.
arXiv Detail & Related papers (2025-01-15T20:22:35Z)
Investigating large language models for their competence in extracting grammatically sound sentences from transcribed noisy utterances [1.3597551064547497]
Humans exhibit remarkable cognitive abilities to separate semantically significant content from speech-specific noise. We investigate whether large language models (LLMs) can effectively perform analogical speech comprehension tasks.
arXiv Detail & Related papers (2024-10-07T14:55:20Z)
Traffic Light or Light Traffic? Investigating Phrasal Semantics in Large Language Models [41.233879429714925]
This study critically examines the capacity of API-based large language models to comprehend phrase semantics. We assess the performance of LLMs in executing phrase semantic reasoning tasks guided by natural language instructions. We conduct detailed error analyses to interpret the limitations faced by LLMs in comprehending phrase semantics.
arXiv Detail & Related papers (2024-10-03T08:44:17Z)
Do LLMs Really Adapt to Domains? An Ontology Learning Perspective [2.0755366440393743]
Large Language Models (LLMs) have demonstrated unprecedented prowess across various natural language processing tasks in various application domains. Recent studies show that LLMs can be leveraged to perform lexical semantic tasks, such as Knowledge Base Completion (KBC) or Ontology Learning (OL) This paper investigates the question: Do LLMs really adapt to domains and remain consistent in the extraction of structured knowledge, or do they only learn lexical senses instead of reasoning?
arXiv Detail & Related papers (2024-07-29T13:29:43Z)
The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition [74.04775677110179]
In-context Learning (ICL) has emerged as a powerful paradigm for performing natural language tasks with Large Language Models (LLM) We show that LLMs have strong yet inconsistent priors in emotion recognition that ossify their predictions. Our results suggest that caution is needed when using ICL with larger LLMs for affect-centered tasks outside their pre-training domain.
arXiv Detail & Related papers (2024-03-25T19:07:32Z)
Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics [50.982315553104975]
We investigate the bottom-up evolution of lexical semantics for a popular large language model, namely Llama2. Our experiments show that the representations in lower layers encode lexical semantics, while the higher layers, with weaker semantic induction, are responsible for prediction. This is in contrast to models with discriminative objectives, such as mask language modeling, where the higher layers obtain better lexical semantics.
arXiv Detail & Related papers (2024-03-03T13:14:47Z)
Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves. We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features. Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z)
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial. We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z)
Are Representations Built from the Ground Up? An Empirical Examination of Local Composition in Language Models [91.3755431537592]
Representing compositional and non-compositional phrases is critical for language understanding. We first formulate a problem of predicting the LM-internal representations of longer phrases given those of their constituents. While we would expect the predictive accuracy to correlate with human judgments of semantic compositionality, we find this is largely not the case.
arXiv Detail & Related papers (2022-10-07T14:21:30Z)
Probing Pretrained Language Models for Lexical Semantics [76.73599166020307]
We present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks. Our results indicate patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
arXiv Detail & Related papers (2020-10-12T14:24:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.