Database Reasoning Over Text
- URL: http://arxiv.org/abs/2106.01074v1
- Date: Wed, 2 Jun 2021 11:09:40 GMT
- Title: Database Reasoning Over Text
- Authors: James Thorne, Majid Yazdani, Marzieh Saeidi, Fabrizio Silvestri,
Sebastian Riedel, Alon Halevy
- Abstract summary: We show that state-of-the-art transformer models perform very well for small databases.
We propose a modular architecture to answer database-style queries over multiple spans from text.
Our architecture scales to databases containing thousands of facts whereas contemporary models are limited by how many facts can be encoded.
- Score: 11.074939080454412
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural models have shown impressive performance gains in answering queries
from natural language text. However, existing works are unable to support
database queries, such as "List/Count all female athletes who were born in 20th
century", which require reasoning over sets of relevant facts with operations
such as join, filtering and aggregation. We show that while state-of-the-art
transformer models perform very well for small databases, they exhibit
limitations in processing noisy data, numerical operations, and queries that
aggregate facts. We propose a modular architecture to answer these
database-style queries over multiple spans from text and aggregating these at
scale. We evaluate the architecture using WikiNLDB, a novel dataset for
exploring such queries. Our architecture scales to databases containing
thousands of facts whereas contemporary models are limited by how many facts
can be encoded. In direct comparison on small databases, our approach increases
overall answer accuracy from 85% to 90%. On larger databases, our approach
retains its accuracy whereas transformer baselines could not encode the
context.
Related papers
- Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows [64.94146689665628]
Spider 2.0 is an evaluation framework for real-world text-to-sql problems derived from enterprise-level database use cases.
The databases in Spider 2.0 are sourced from real data applications, often containing over 1,000 columns and stored in local or cloud database systems such as BigQuery and Snowflake.
We show that solving problems in Spider 2.0 frequently requires understanding and searching through database metadata, dialect documentation, and even project-levels.
arXiv Detail & Related papers (2024-11-12T12:52:17Z) - UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics.
We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - Can LLM Already Serve as A Database Interface? A BIg Bench for
Large-Scale Database Grounded Text-to-SQLs [89.68522473384522]
We present Bird, a big benchmark for large-scale database grounded in text-to-efficient tasks.
Our emphasis on database values highlights the new challenges of dirty database contents.
Even the most effective text-to-efficient models, i.e. ChatGPT, achieves only 40.08% in execution accuracy.
arXiv Detail & Related papers (2023-05-04T19:02:29Z) - Multimodal Neural Databases [4.321727213494619]
We propose a new framework, which we name Multimodal Neural databases (MMNDBs)
MMNDBs can answer complex database-like queries involving reasoning over different input modalities, such as text and images, at scale.
We show the potential of these new techniques to process unstructured data coming from different modalities, paving the way for future research.
arXiv Detail & Related papers (2023-05-02T14:27:56Z) - Improving Text-to-SQL Semantic Parsing with Fine-grained Query
Understanding [84.04706075621013]
We present a general-purpose, modular neural semantic parsing framework based on token-level fine-grained query understanding.
Our framework consists of three modules: named entity recognizer (NER), neural entity linker (NEL) and neural entity linker (NSP)
arXiv Detail & Related papers (2022-09-28T21:00:30Z) - SPARQLing Database Queries from Intermediate Question Decompositions [7.475027071883912]
To translate natural language questions into database queries, most approaches rely on a fully annotated training set.
We reduce this burden using grounded in databases intermediate question representations.
Our pipeline consists of two parts: a semantic that converts natural language questions into the intermediate representations and a non-trainable transpiler to the QLSPAR query language.
arXiv Detail & Related papers (2021-09-13T17:57:12Z) - Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open
Domain Question Answering [78.9863753810787]
A large amount of world's knowledge is stored in structured databases.
query languages can answer questions that require complex reasoning, as well as offering full explainability.
arXiv Detail & Related papers (2021-08-05T22:04:13Z) - KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers [26.15889661083109]
We present KDBaggleQA, a new cross-domain evaluation dataset of real Web databases.
We show that KDBaggleQA presents a challenge to state-of-the-art zero-shots but that a more realistic evaluation setting and creative use of associated database documentation boosts their accuracy by over 13.2%.
arXiv Detail & Related papers (2021-06-22T00:08:03Z) - Translating synthetic natural language to database queries: a polyglot
deep learning framework [0.0]
Polyglotter supports the mapping of natural language searches to database queries.
It does not require the creation of manually annotated data for training.
Our results indicate that our framework performs well on both synthetic and real databases.
arXiv Detail & Related papers (2021-04-14T17:43:51Z) - Neural Databases [23.273308740532254]
We describe a database system with no pre-defined schema, in which updates and queries are given in natural language.
We experimentally validate the accuracy of NeuralDB and its components, showing we can answer queries over thousands of sentences with very high accuracy.
arXiv Detail & Related papers (2020-10-14T11:31:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.