Related papers: What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance

What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance

URL: http://arxiv.org/abs/2602.20300v1
Date: Mon, 23 Feb 2026 19:30:08 GMT
Title: What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance
Authors: William Watson, Nicole Cho, Sumitra Ganesh, Manuela Veloso,
Abstract summary: We operationalize this insight by constructing a 22-dimension query feature vector covering clause complexity, lexical rarity, and anaphora, negation, answerability, and intention grounding.<n>Using 369,837 real-world queries, we ask: Are there certain types of queries that make hallucination more likely?<n>A large-scale analysis reveals a consistent "risk landscape": certain features such as deep clause nesting and underspecification align with higher hallucination propensity.
Score: 17.12787601890563
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Model (LLM) hallucinations are usually treated as defects of the model or its decoding strategy. Drawing on classical linguistics, we argue that a query's form can also shape a listener's (and model's) response. We operationalize this insight by constructing a 22-dimension query feature vector covering clause complexity, lexical rarity, and anaphora, negation, answerability, and intention grounding, all known to affect human comprehension. Using 369,837 real-world queries, we ask: Are there certain types of queries that make hallucination more likely? A large-scale analysis reveals a consistent "risk landscape": certain features such as deep clause nesting and underspecification align with higher hallucination propensity. In contrast, clear intention grounding and answerability align with lower hallucination rates. Others, including domain specificity, show mixed, dataset- and model-dependent effects. Thus, these findings establish an empirically observable query-feature representation correlated with hallucination risk, paving the way for guided query rewriting and future intervention studies.

Related papers

Review of Hallucination Understanding in Large Language and Vision Models [65.29139004945712]
We present a framework for characterizing both image and text hallucinations across diverse applications.<n>Our investigations reveal that hallucinations often stem from predictable patterns in data distributions and inherited biases.<n>This survey provides a foundation for developing more robust and effective solutions to hallucinations in real-world generative AI systems.
arXiv Detail & Related papers (2025-09-26T09:23:08Z)
How Large Language Models are Designed to Hallucinate [0.42970700836450487]
We argue that hallucination is a structural outcome of the transformer architecture.<n>Our contribution is threefold: (1) a comparative account showing why existing explanations are insufficient; (2) a predictive taxonomy of hallucination linked to existential structures with proposed benchmarks; and (3) design directions toward "truth-constrained" architectures capable of withholding or deferring when disclosure is absent.
arXiv Detail & Related papers (2025-09-19T16:46:27Z)
Beyond Facts: Evaluating Intent Hallucination in Large Language Models [13.315302240710164]
FAITHQA is a novel benchmark for intent hallucination that contains 20,068 problems.<n>We find that intent hallucination is a common issue even for state-of-the-art models.<n>We introduce an automatic LLM generation evaluation metric, CONSTRAINT SCORE, for detecting intent hallucination.
arXiv Detail & Related papers (2025-06-06T21:10:55Z)
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations [82.42811602081692]
This paper introduces a subsequence association framework to systematically trace and understand hallucinations.<n>Key insight is hallucinations that arise when dominant hallucinatory associations outweigh faithful ones.<n>We propose a tracing algorithm that identifies causal subsequences by analyzing hallucination probabilities across randomized input contexts.
arXiv Detail & Related papers (2025-04-17T06:34:45Z)
Trust Me, I'm Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer [51.7407540261676]
We investigate a distinct type of hallucination, where a model can consistently answer a question correctly, but a seemingly trivial perturbation causes it to produce a hallucinated response with high certainty.<n>This phenomenon is particularly concerning in high-stakes domains such as medicine or law, where model certainty is often used as a proxy for reliability.<n>We show that CHOKE examples are consistent across prompts, occur in different models and datasets, and are fundamentally distinct from other hallucinations.
arXiv Detail & Related papers (2025-02-18T15:46:31Z)
Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models [70.19081534515371]
Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks. They generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences. We propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers.
arXiv Detail & Related papers (2024-07-04T18:47:42Z)
Confabulation: The Surprising Value of Large Language Model Hallucinations [0.7249731529275342]
We argue that measurable semantic characteristics of LLM confabulations mirror a human propensity to utilize increased narrativity as a cognitive resource for sense-making and communication. This finding reveals a tension in our usually dismissive understandings of confabulation.
arXiv Detail & Related papers (2024-06-06T15:32:29Z)
On Large Language Models' Hallucination with Regard to Known Facts [74.96789694959894]
Large language models are successful in answering factoid questions but are also prone to hallucination. We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics. Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating.
arXiv Detail & Related papers (2024-03-29T06:48:30Z)
On Hallucination and Predictive Uncertainty in Conditional Language Generation [76.18783678114325]
Higher predictive uncertainty corresponds to a higher chance of hallucination. Epistemic uncertainty is more indicative of hallucination than aleatoric or total uncertainties. It helps to achieve better results of trading performance in standard metric for less hallucination with the proposed beam search variant.
arXiv Detail & Related papers (2021-03-28T00:32:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.