Related papers: Do LLMs Understand Ambiguity in Text? A Case Study in Open-world Question Answering

Do LLMs Understand Ambiguity in Text? A Case Study in Open-world Question Answering

URL: http://arxiv.org/abs/2411.12395v1
Date: Tue, 19 Nov 2024 10:27:26 GMT
Title: Do LLMs Understand Ambiguity in Text? A Case Study in Open-world Question Answering
Authors: Aryan Keluskar, Amrita Bhattacharjee, Huan Liu,
Abstract summary: Ambiguity in natural language poses significant challenges to Large Language Models (LLMs) used for open-domain question answering. We compare off-the-shelf and few-shot LLM performance, focusing on measuring the impact of explicit disambiguation strategies. We demonstrate how simple, training-free, token-level disambiguation methods may be effectively used to improve LLM performance for ambiguous question answering tasks.
Score: 15.342415325821063
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ambiguity in natural language poses significant challenges to Large Language Models (LLMs) used for open-domain question answering. LLMs often struggle with the inherent uncertainties of human communication, leading to misinterpretations, miscommunications, hallucinations, and biased responses. This significantly weakens their ability to be used for tasks like fact-checking, question answering, feature extraction, and sentiment analysis. Using open-domain question answering as a test case, we compare off-the-shelf and few-shot LLM performance, focusing on measuring the impact of explicit disambiguation strategies. We demonstrate how simple, training-free, token-level disambiguation methods may be effectively used to improve LLM performance for ambiguous question answering tasks. We empirically show our findings and discuss best practices and broader impacts regarding ambiguity in LLMs.

Related papers

LLM-Specific Utility: A New Perspective for Retrieval-Augmented Generation [110.610512800947]
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge.<n>Existing studies often treat utility as a generic attribute, ignoring the fact that different LLMs may benefit differently from the same passage.
arXiv Detail & Related papers (2025-10-13T12:57:45Z)
It Depends: Resolving Referential Ambiguity in Minimal Contexts with Commonsense Knowledge [3.340255811686752]
We investigate whether Large Language Models can leverage commonsense to resolve referential ambiguity in multi-turn conversations.<n>We test DeepSeek v3, GPT-4o, Qwen3-32B, GPT-4o-mini, and Llama-3.1-8B via LLM-as-Judge and human annotations.
arXiv Detail & Related papers (2025-09-19T15:49:26Z)
Uncovering the Fragility of Trustworthy LLMs through Chinese Textual Ambiguity [16.065963688326242]
We study the trustworthiness of large language models (LLMs) when encountering ambiguous narrative text in Chinese.<n>We created a benchmark dataset by collecting and generating ambiguous sentences with context and their corresponding disambiguated pairs.<n>We discovered significant fragility in LLMs when handling ambiguity, revealing behavior that differs substantially from humans.
arXiv Detail & Related papers (2025-07-30T21:50:19Z)
Disambiguation in Conversational Question Answering in the Era of LLMs and Agents: A Survey [54.90240495777929]
Ambiguity remains a fundamental challenge in Natural Language Processing (NLP)<n>With the advent of Large Language Models (LLMs), addressing ambiguity has become even more critical due to their expanded capabilities and applications.<n>This paper explores the definition, forms, and implications of ambiguity for language driven systems.
arXiv Detail & Related papers (2025-05-18T20:53:41Z)
Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English [66.97110551643722]
We investigate dialectal disparities in Large Language Models (LLMs) reasoning tasks. We find that LLMs produce less accurate responses and simpler reasoning chains and explanations for AAE inputs. These findings highlight systematic differences in how LLMs process and reason about different language varieties.
arXiv Detail & Related papers (2025-03-06T05:15:34Z)
Not All LLM Reasoners Are Created Equal [58.236453890457476]
We study the depth of grade-school math problem-solving capabilities of LLMs. We evaluate their performance on pairs of existing math word problems together.
arXiv Detail & Related papers (2024-10-02T17:01:10Z)
Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift. We devise a series of experiments to empirically explain the performance gap.
arXiv Detail & Related papers (2024-09-27T05:06:43Z)
Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models [0.0]
Large language models (LLMs) have generated significant attention since their inception, finding applications across various academic and industrial domains. LLMs often suffer from the "hallucination problem", where outputs, though grammatically and logically coherent, lack factual accuracy or are entirely fabricated.
arXiv Detail & Related papers (2024-08-09T14:34:32Z)
Do LLMs Really Adapt to Domains? An Ontology Learning Perspective [2.0755366440393743]
Large Language Models (LLMs) have demonstrated unprecedented prowess across various natural language processing tasks in various application domains. Recent studies show that LLMs can be leveraged to perform lexical semantic tasks, such as Knowledge Base Completion (KBC) or Ontology Learning (OL) This paper investigates the question: Do LLMs really adapt to domains and remain consistent in the extraction of structured knowledge, or do they only learn lexical senses instead of reasoning?
arXiv Detail & Related papers (2024-07-29T13:29:43Z)
Crafting Interpretable Embeddings by Asking LLMs Questions [89.49960984640363]
Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks. We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM. We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli.
arXiv Detail & Related papers (2024-05-26T22:30:29Z)
Aligning Language Models to Explicitly Handle Ambiguity [22.078095273053506]
We propose Alignment with Perceived Ambiguity (APA), a novel pipeline that aligns language models to deal with ambiguous queries. Experimental results on question-answering datasets demonstrate that APA empowers LLMs to explicitly detect and manage ambiguous queries. Our finding proves that APA excels beyond training with gold-standard labels, especially in out-of-distribution scenarios.
arXiv Detail & Related papers (2024-04-18T07:59:53Z)
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans. RaR serves as a simple yet effective prompting method for improving performance. We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z)
Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method [36.24876571343749]
Large Language Models (LLMs) have shown great potential in Natural Language Processing (NLP) tasks. Recent literature reveals that LLMs generate nonfactual responses intermittently. We propose a novel self-detection method to detect which questions that a LLM does not know that are prone to generate nonfactual results.
arXiv Detail & Related papers (2023-10-27T06:22:14Z)
Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation [109.8527403904657]
We show that large language models (LLMs) possess unwavering confidence in their knowledge and cannot handle the conflict between internal and external knowledge well. Retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries. We propose a simple method to dynamically utilize supporting documents with our judgement strategy.
arXiv Detail & Related papers (2023-07-20T16:46:10Z)
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs [26.118193748582197]
We evaluate four categories of widely used state-of-the-art models. We find that, despite only evaluating on utterances that require a binary inference, models in three of these categories perform close to random. These results suggest that certain fine-tuning strategies are far better at inducing pragmatic understanding in models.
arXiv Detail & Related papers (2022-10-26T19:04:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.