The quasi-semantic competence of LLMs: a case study on the part-whole relation
- URL: http://arxiv.org/abs/2504.02395v1
- Date: Thu, 03 Apr 2025 08:41:26 GMT
- Title: The quasi-semantic competence of LLMs: a case study on the part-whole relation
- Authors: Mattia Proietti, Alessandro Lenci,
- Abstract summary: We investigate knowledge of the emphpart-whole relation, a.k.a. emphmeronymy.<n>We show that emphquasi-semantic'' models have just a emphquasi-semantic'' competence and still fall short of capturing deep inferential properties.
- Score: 53.37191762146552
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding the extent and depth of the semantic competence of \emph{Large Language Models} (LLMs) is at the center of the current scientific agenda in Artificial Intelligence (AI) and Computational Linguistics (CL). We contribute to this endeavor by investigating their knowledge of the \emph{part-whole} relation, a.k.a. \emph{meronymy}, which plays a crucial role in lexical organization, but it is significantly understudied. We used data from ConceptNet relations \citep{speer2016conceptnet} and human-generated semantic feature norms \citep{McRae:2005} to explore the abilities of LLMs to deal with \textit{part-whole} relations. We employed several methods based on three levels of analysis: i.) \textbf{behavioral} testing via prompting, where we directly queried the models on their knowledge of meronymy, ii.) sentence \textbf{probability} scoring, where we tested models' abilities to discriminate correct (real) and incorrect (asymmetric counterfactual) \textit{part-whole} relations, and iii.) \textbf{concept representation} analysis in vector space, where we proved the linear organization of the \textit{part-whole} concept in the embedding and unembedding spaces. These analyses present a complex picture that reveals that the LLMs' knowledge of this relation is only partial. They have just a ``\emph{quasi}-semantic'' competence and still fall short of capturing deep inferential properties.
Related papers
- QUDsim: Quantifying Discourse Similarities in LLM-Generated Text [70.22275200293964]
We introduce an abstraction based on linguistic theories in Questions Under Discussion (QUD) and question semantics to help quantify differences in discourse progression.
We then use this framework to build $textbfQUDsim$, a similarity metric that can detect discursive parallels between documents.
Using QUDsim, we find that LLMs often reuse discourse structures (more so than humans) across samples, even when content differs.
arXiv Detail & Related papers (2025-04-12T23:46:09Z) - Semantic Mastery: Enhancing LLMs with Advanced Natural Language Understanding [0.0]
The paper discusses state-of-the-art methodologies that advance large language models (LLMs) with more advanced NLU techniques.
We analyze the use of structured knowledge graphs, retrieval-augmented generation (RAG), and fine-tuning strategies that match models with human-level understanding.
arXiv Detail & Related papers (2025-04-01T04:12:04Z) - Formalizing Complex Mathematical Statements with LLMs: A Study on Mathematical Definitions [8.135142928659546]
We introduce two novel resources for autoformalization, collecting definitions from Wikipedia (Def_Wiki) and arXiv papers (Def_ArXiv)<n>We evaluate a range of LLMs, analyzing their ability to formalize definitions into Isabelle/HOL.<n>Our findings reveal that definitions present a greater challenge compared to existing benchmarks, such as miniF2F.
arXiv Detail & Related papers (2025-02-17T17:34:48Z) - Can Large Language Models Identify Authorship? [16.35265384114857]
Large Language Models (LLMs) have demonstrated an exceptional capacity for reasoning and problem-solving.
This work seeks to address three research questions: (1) Can LLMs perform zero-shot, end-to-end authorship verification effectively?
(2) Are LLMs capable of accurately attributing authorship among multiple candidates authors (e.g., 10 and 20)?
arXiv Detail & Related papers (2024-03-13T03:22:02Z) - Vocabulary-Defined Semantics: Latent Space Clustering for Improving In-Context Learning [32.178931149612644]
In-context learning enables language models to adapt to downstream data or incorporate tasks by few samples as demonstrations within the prompts.
However, the performance of in-context learning can be unstable depending on the quality, format, or order of demonstrations.
We propose a novel approach "vocabulary-defined semantics"
arXiv Detail & Related papers (2024-01-29T14:29:48Z) - Zero-shot Causal Graph Extrapolation from Text via LLMs [50.596179963913045]
We evaluate the ability of large language models (LLMs) to infer causal relations from natural language.
LLMs show competitive performance in a benchmark of pairwise relations without needing (explicit) training samples.
We extend our approach to extrapolating causal graphs through iterated pairwise queries.
arXiv Detail & Related papers (2023-12-22T13:14:38Z) - A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia [57.31074448586854]
Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context.
Yet the mechanisms underlying this contextual grounding remain unknown.
We present a novel method to study grounding abilities using Fakepedia.
arXiv Detail & Related papers (2023-12-04T17:35:42Z) - Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond [46.75497042978449]
Large Language Models (LLMs) have emerged as a noteworthy innovation in natural language processing (NLP)
We aim to bridge this gap and provide comprehensive evaluations in this paper.
Considering the comprehensiveness of evaluations, we include 3 early-era representative LLMs and 4 trending LLMs.
arXiv Detail & Related papers (2023-06-16T13:39:35Z) - Evaluating Language Models for Mathematics through Interactions [116.67206980096513]
We introduce CheckMate, a prototype platform for humans to interact with and evaluate large language models (LLMs)
We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics.
We derive a taxonomy of human behaviours and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness.
arXiv Detail & Related papers (2023-06-02T17:12:25Z) - Large Language Models are In-Context Semantic Reasoners rather than
Symbolic Reasoners [75.85554779782048]
Large Language Models (LLMs) have excited the natural language and machine learning community over recent years.
Despite of numerous successful applications, the underlying mechanism of such in-context capabilities still remains unclear.
In this work, we hypothesize that the learned textitsemantics of language tokens do the most heavy lifting during the reasoning process.
arXiv Detail & Related papers (2023-05-24T07:33:34Z) - Analytic Manifold Learning: Unifying and Evaluating Representations for
Continuous Control [32.773203015440075]
We address the problem of learning reusable state representations from streaming high-dimensional observations.
This is important for areas like Reinforcement Learning (RL), which yields non-stationary data distributions during training.
arXiv Detail & Related papers (2020-06-15T19:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.