Searching for Better Database Queries in the Outputs of Semantic Parsers
- URL: http://arxiv.org/abs/2210.07201v1
- Date: Thu, 13 Oct 2022 17:20:45 GMT
- Title: Searching for Better Database Queries in the Outputs of Semantic Parsers
- Authors: Anton Osokin, Irina Saparina, Ramil Yarullin
- Abstract summary: In this paper, we consider the case when, at the test time, the system has access to an external criterion that evaluates the generated queries.
The criterion can vary from checking that a query executes without errors to verifying the query on a set of tests.
We apply our approach to the state-of-the-art semantics and report that it allows us to find many queries passing all the tests on different datasets.
- Score: 16.221439565760058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of generating a database query from a question in natural language
suffers from ambiguity and insufficiently precise description of the goal. The
problem is amplified when the system needs to generalize to databases unseen at
training. In this paper, we consider the case when, at the test time, the
system has access to an external criterion that evaluates the generated
queries. The criterion can vary from checking that a query executes without
errors to verifying the query on a set of tests. In this setting, we augment
neural autoregressive models with a search algorithm that looks for a query
satisfying the criterion. We apply our approach to the state-of-the-art
semantic parsers and report that it allows us to find many queries passing all
the tests on different datasets.
Related papers
- Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations [85.81295563405433]
Language model users often issue queries that lack specification, where the context under which a query was issued is not explicit.
We present contextualized evaluations, a protocol that synthetically constructs context surrounding an under-specified query and provides it during evaluation.
We find that the presence of context can 1) alter conclusions drawn from evaluation, even flipping win rates between model pairs, 2) nudge evaluators to make fewer judgments based on surface-level criteria, like style, and 3) provide new insights about model behavior across diverse contexts.
arXiv Detail & Related papers (2024-11-11T18:58:38Z) - DAGE: DAG Query Answering via Relational Combinator with Logical Constraints [24.60431781360608]
We propose a query embedding method for DAG queries called DAGE.
DAGE combines the possibly multiple paths between two nodes into a single path with a trainable operator.
We show that it is possible to implement DAGE on top of existing query embedding methods.
arXiv Detail & Related papers (2024-10-29T15:02:48Z) - AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries [56.82807063333088]
We introduce a new benchmark, AMBROSIA, which we hope will inform and inspire the development of text-to-open programs.
Our dataset contains questions showcasing three different types of ambiguity (scope ambiguity, attachment ambiguity, and vagueness)
In each case, the ambiguity persists even when the database context is provided.
This is achieved through a novel approach that involves controlled generation of databases from scratch.
arXiv Detail & Related papers (2024-06-27T10:43:04Z) - UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics.
We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z) - Database-Augmented Query Representation for Information Retrieval [59.57065228857247]
We present a novel retrieval framework called Database-Augmented Query representation (DAQu)
DAQu augments the original query with various (query-related) metadata across multiple tables.
We validate DAQu in diverse retrieval scenarios that can incorporate metadata from the relational database.
arXiv Detail & Related papers (2024-06-23T05:02:21Z) - Testing Database Engines via Query Plan Guidance [6.789710498230718]
We propose the concept of Query Plan Guidance (QPG) for guiding automated testing towards "interesting" test cases.
We apply our method to three mature, widely-used, and diverse database systems-DBite, TiDB, and Cockroach-and found 53 unique, previously unknown bugs.
arXiv Detail & Related papers (2023-12-29T08:09:47Z) - QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set
Operations [36.70770411188946]
QUEST is a dataset of 3357 natural language queries with implicit set operations.
The dataset challenges models to match multiple constraints mentioned in queries with corresponding evidence in documents.
We analyze several modern retrieval systems, finding that they often struggle on such queries.
arXiv Detail & Related papers (2023-05-19T14:19:32Z) - Improving Text-to-SQL Semantic Parsing with Fine-grained Query
Understanding [84.04706075621013]
We present a general-purpose, modular neural semantic parsing framework based on token-level fine-grained query understanding.
Our framework consists of three modules: named entity recognizer (NER), neural entity linker (NEL) and neural entity linker (NSP)
arXiv Detail & Related papers (2022-09-28T21:00:30Z) - Graph Enhanced BERT for Query Understanding [55.90334539898102]
query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information.
In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks.
We propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph.
arXiv Detail & Related papers (2022-04-03T16:50:30Z) - SPARQLing Database Queries from Intermediate Question Decompositions [7.475027071883912]
To translate natural language questions into database queries, most approaches rely on a fully annotated training set.
We reduce this burden using grounded in databases intermediate question representations.
Our pipeline consists of two parts: a semantic that converts natural language questions into the intermediate representations and a non-trainable transpiler to the QLSPAR query language.
arXiv Detail & Related papers (2021-09-13T17:57:12Z) - "What makes my queries slow?": Subgroup Discovery for SQL Workload
Analysis [1.3124513975412255]
We introduce an original approach rooted on Subgroup Discovery.
We show how to instantiate and develop this generic data-mining framework.
We also provide a visualization tool for interactive knowledge discovery.
arXiv Detail & Related papers (2021-08-09T09:44:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.