Are We Asking the Right Questions? On Ambiguity in Natural Language Queries for Tabular Data Analysis
- URL: http://arxiv.org/abs/2511.04584v2
- Date: Thu, 13 Nov 2025 01:48:55 GMT
- Title: Are We Asking the Right Questions? On Ambiguity in Natural Language Queries for Tabular Data Analysis
- Authors: Daniel Gomm, Cornelius Wolff, Madelon Hulsebos,
- Abstract summary: We develop a principled framework based on a shared responsibility of query specification between user and system.<n>Applying the framework to evaluations for question answering and analysis, we analyze the queries in 15 popular datasets.
- Score: 2.905751301655124
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural language interfaces to tabular data must handle ambiguities inherent to queries. Instead of treating ambiguity as a deficiency, we reframe it as a feature of cooperative interaction where users are intentional about the degree to which they specify queries. We develop a principled framework based on a shared responsibility of query specification between user and system, distinguishing unambiguous and ambiguous cooperative queries, which systems can resolve through reasonable inference, from uncooperative queries that cannot be resolved. Applying the framework to evaluations for tabular question answering and analysis, we analyze the queries in 15 popular datasets, and observe an uncontrolled mixing of query types neither adequate for evaluating a system's execution accuracy nor for evaluating interpretation capabilities. This conceptualization around cooperation in resolving queries informs how to design and evaluate natural language interfaces for tabular data analysis, for which we distill concrete directions for future research and broader implications.
Related papers
- Testing for LLM response differences: the case of a composite null consisting of semantically irrelevant query perturbations [10.216191904121178]
Given two input queries, it is natural to ask if their response distributions are the same.<n>A traditional test of equality might indicate that two semantically equivalent queries induce statistically different response distributions.<n>In this paper, we address this misalignment by incorporating into the testing procedure consideration of a collection of semantically similar queries.
arXiv Detail & Related papers (2025-09-13T19:44:42Z) - Reasoning-enhanced Query Understanding through Decomposition and Interpretation [87.56450566014625]
ReDI is a Reasoning-enhanced approach for query understanding through Decomposition and Interpretation.<n>We compiled a large-scale dataset of real-world complex queries from a major search engine.<n> Experiments on BRIGHT and BEIR demonstrate that ReDI consistently surpasses strong baselines in both sparse and dense retrieval paradigms.
arXiv Detail & Related papers (2025-09-08T10:58:42Z) - Data-Aware Socratic Query Refinement in Database Systems [12.533468345817528]
We propose Data-Aware Socratic Guidance (DASG), a dialogue-based query enhancement framework.<n>DASG embeds linebreak interactive clarification as a first-class operator within database systems to resolve ambiguity in natural language queries.<n>Our algorithm selects the optimal clarifications by combining semantic relevance, catalog-based information gain, and potential cost reduction.
arXiv Detail & Related papers (2025-08-07T06:28:16Z) - CLEAR-KGQA: Clarification-Enhanced Ambiguity Resolution for Knowledge Graph Question Answering [13.624962763072899]
KGQA systems typically assume user queries are unambiguous, which is an assumption that rarely holds in real-world applications.<n>We propose a novel framework that dynamically handles both entity ambiguity (e.g., distinguishing between entities with similar names) and intent ambiguity (e.g., clarifying different interpretations of user queries) through interactive clarification.
arXiv Detail & Related papers (2025-04-13T17:34:35Z) - Disambiguate First, Parse Later: Generating Interpretations for Ambiguity Resolution in Semantic Parsing [56.82807063333088]
We propose a modular approach that resolves ambiguity using natural language interpretations before mapping these to logical forms.<n>Our approach improves interpretation coverage and generalizes across datasets with different annotation styles, database structures, and ambiguity types.
arXiv Detail & Related papers (2025-02-25T18:42:26Z) - Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries [85.81295563405433]
We present a protocol that synthetically constructs context surrounding an under-specified query and provides it during evaluation.<n>We find that the presence of context can 1) alter conclusions drawn from evaluation, even flipping benchmark rankings between model pairs, 2) nudge evaluators to make fewer judgments based on surface-level criteria, like style, and 3) provide new insights about model behavior across diverse contexts.
arXiv Detail & Related papers (2024-11-11T18:58:38Z) - AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries [56.82807063333088]
We introduce a new benchmark, AMBROSIA, which we hope will inform and inspire the development of text-to-open programs.
Our dataset contains questions showcasing three different types of ambiguity (scope ambiguity, attachment ambiguity, and vagueness)
In each case, the ambiguity persists even when the database context is provided.
This is achieved through a novel approach that involves controlled generation of databases from scratch.
arXiv Detail & Related papers (2024-06-27T10:43:04Z) - Semantic Parsing for Conversational Question Answering over Knowledge
Graphs [63.939700311269156]
We develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof.
We present two different semantic parsing approaches and highlight the challenges of the task.
Our dataset and models are released at https://github.com/Edinburgh/SPICE.
arXiv Detail & Related papers (2023-01-28T14:45:11Z) - Searching for Better Database Queries in the Outputs of Semantic Parsers [16.221439565760058]
In this paper, we consider the case when, at the test time, the system has access to an external criterion that evaluates the generated queries.
The criterion can vary from checking that a query executes without errors to verifying the query on a set of tests.
We apply our approach to the state-of-the-art semantics and report that it allows us to find many queries passing all the tests on different datasets.
arXiv Detail & Related papers (2022-10-13T17:20:45Z) - Query Focused Multi-Document Summarization with Distant Supervision [88.39032981994535]
Existing work relies heavily on retrieval-style methods for estimating the relevance between queries and text segments.
We propose a coarse-to-fine modeling framework which introduces separate modules for estimating whether segments are relevant to the query.
We demonstrate that our framework outperforms strong comparison systems on standard QFS benchmarks.
arXiv Detail & Related papers (2020-04-06T22:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.