A Benchmark to Understand the Role of Knowledge Graphs on Large Language
Model's Accuracy for Question Answering on Enterprise SQL Databases
- URL: http://arxiv.org/abs/2311.07509v1
- Date: Mon, 13 Nov 2023 17:54:50 GMT
- Title: A Benchmark to Understand the Role of Knowledge Graphs on Large Language
Model's Accuracy for Question Answering on Enterprise SQL Databases
- Authors: Juan Sequeda, Dean Allemang, Bryon Jacob
- Abstract summary: This study aims to evaluate the accuracy of LLM-powered question answering systems in the context of enterprise questions.
We also explore the role of knowledge graphs in improving accuracy.
- Score: 1.0786522863027366
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Enterprise applications of Large Language Models (LLMs) hold promise for
question answering on enterprise SQL databases. However, the extent to which
LLMs can accurately respond to enterprise questions in such databases remains
unclear, given the absence of suitable Text-to-SQL benchmarks tailored to
enterprise settings. Additionally, the potential of Knowledge Graphs (KGs) to
enhance LLM-based question answering by providing business context is not well
understood. This study aims to evaluate the accuracy of LLM-powered question
answering systems in the context of enterprise questions and SQL databases,
while also exploring the role of knowledge graphs in improving accuracy. To
achieve this, we introduce a benchmark comprising an enterprise SQL schema in
the insurance domain, a range of enterprise queries encompassing reporting to
metrics, and a contextual layer incorporating an ontology and mappings that
define a knowledge graph. Our primary finding reveals that question answering
using GPT-4, with zero-shot prompts directly on SQL databases, achieves an
accuracy of 16%. Notably, this accuracy increases to 54% when questions are
posed over a Knowledge Graph representation of the enterprise SQL database.
Therefore, investing in Knowledge Graph provides higher accuracy for LLM
powered question answering systems.
Related papers
- Towards Evaluating Large Language Models for Graph Query Generation [49.49881799107061]
Large Language Models (LLMs) are revolutionizing the landscape of Generative Artificial Intelligence (GenAI)
This paper presents a comparative study addressing the challenge of generating queries a powerful language for interacting with graph databases using open-access LLMs.
Our empirical analysis of query generation accuracy reveals that Claude Sonnet 3.5 outperforms its counterparts in this specific domain.
arXiv Detail & Related papers (2024-11-13T09:11:56Z) - E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL [1.187832944550453]
We introduce E-Seek, a novel pipeline specifically designed to address these challenges through direct schema linking and candidate predicate augmentation.
E-Seek enhances the natural language query by incorporating relevant database items (i.e., tables, columns, and values) and conditions directly into the question andsql construction plan, bridging the gap between the query and the database structure.
Comprehensive evaluations illustrate that E-Seek achieves competitive performance, particularly excelling in complex queries with a 66.29% execution accuracy on the test set.
arXiv Detail & Related papers (2024-09-25T09:02:48Z) - BEAVER: An Enterprise Benchmark for Text-to-SQL [6.3900786001871195]
Existing text-to-generated benchmarks have largely been constructed using publicly available tables from the web.
In this paper, we apply off-the-shelf LLMs to a benchmark containing enterprise data warehouse data.
As we will show, the reasons for poor performance are largely due to three characteristics.
arXiv Detail & Related papers (2024-09-03T16:37:45Z) - Hybrid Querying Over Relational Databases and Large Language Models [8.926173054003547]
We present the first cross-domain benchmark, SWAN, containing 120 beyond-Database questions over four real-world databases.
We present two solutions: one based on schema expansion and the other based on user defined functions.
Our evaluation demonstrates that using GPT-4 Turbo with few-shot prompts, one can achieves up to 40.0% in execution accuracy and 48.2% in data factuality.
arXiv Detail & Related papers (2024-08-01T19:29:18Z) - Relational Database Augmented Large Language Model [59.38841050766026]
Large language models (LLMs) excel in many natural language processing (NLP) tasks.
They can only incorporate new knowledge through training or supervised fine-tuning processes.
This precise, up-to-date, and private information is typically stored in relational databases.
arXiv Detail & Related papers (2024-07-21T06:19:10Z) - RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL [48.516004807486745]
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to- task.
We propose RB-, a novel retrieval-based framework for in-context prompt engineering.
Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
arXiv Detail & Related papers (2024-07-11T08:19:58Z) - Automating Pharmacovigilance Evidence Generation: Using Large Language Models to Produce Context-Aware SQL [0.0]
We utilize OpenAI's GPT-4 model within a retrieval-augmented generation (RAG) framework.
Business context document is enriched with a business context document, to transform NLQs into Structured Query Language queries.
Performance achieved a maximum of 85% when high complexity queries are excluded.
arXiv Detail & Related papers (2024-06-15T17:07:31Z) - Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM [15.888784472807775]
Existing methods rely on the comprehensive capability of large language models (LLMs) to generate queries.
We propose the Knowledge-to- Data Expert framework, which employs tailored knowledge for all text-to- models.
arXiv Detail & Related papers (2024-02-18T09:10:04Z) - Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [76.76046657162306]
Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
arXiv Detail & Related papers (2023-08-29T14:59:54Z) - Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge
Graph Question Answering [7.888547093390469]
Large Language Models (LLMs) are capable of performing zero-shot closed-book question answering tasks.
We propose to augment the knowledge directly in the input of LLMs.
Our framework, Knowledge-Augmented language model PromptING (KAPING), requires no model training, thus completely zero-shot.
arXiv Detail & Related papers (2023-06-07T04:15:21Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open
Domain Question Answering [78.9863753810787]
A large amount of world's knowledge is stored in structured databases.
query languages can answer questions that require complex reasoning, as well as offering full explainability.
arXiv Detail & Related papers (2021-08-05T22:04:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.