Baihe: SysML Framework for AI-driven Databases
- URL: http://arxiv.org/abs/2112.14460v1
- Date: Wed, 29 Dec 2021 09:00:07 GMT
- Title: Baihe: SysML Framework for AI-driven Databases
- Authors: Andreas Pfadler, Rong Zhu, Wei Chen, Botong Huang, Tianjing Zeng,
Bolin Ding, Jingren Zhou
- Abstract summary: Using Baihe, an existing relational database system may be retrofitted to use learned components for query optimization or other common tasks.
Baihe's high level architecture is based on the following requirements: separation from the core system, minimal third party dependencies, Robustness, stability and fault tolerance.
- Score: 33.47034563589278
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Baihe, a SysML Framework for AI-driven Databases. Using Baihe, an
existing relational database system may be retrofitted to use learned
components for query optimization or other common tasks, such as e.g. learned
structure for indexing. To ensure the practicality and real world applicability
of Baihe, its high level architecture is based on the following requirements:
separation from the core system, minimal third party dependencies, Robustness,
stability and fault tolerance, as well as stability and configurability. Based
on the high level architecture, we then describe a concrete implementation of
Baihe for PostgreSQL and present example use cases for learned query
optimizers. To serve both practitioners, as well as researchers in the DB and
AI4DB community Baihe for PostgreSQL will be released under open source
license.
Related papers
- LLMIdxAdvis: Resource-Efficient Index Advisor Utilizing Large Language Model [24.579793425796193]
We propose a resource-efficient index advisor that uses large language models (LLMs) without extensive fine-tuning.
LLMs frames index recommendation as a sequence-to-sequence task, taking target workload, storage constraint, and corresponding database environment as input.
Experiments on 3 OLAP and 2 real-world benchmarks reveal that LLMIdxAdvis delivers competitive index recommendation with reduced runtime.
arXiv Detail & Related papers (2025-03-10T22:01:24Z) - DB-Explore: Automated Database Exploration and Instruction Synthesis for Text-to-SQL [18.915121803834698]
We propose DB-Explore, a novel framework for database understanding using large language models (LLMs)
Our framework enables comprehensive database understanding through diverse sampling strategies and automated instruction generation.
Our open-source implementation, based on Qwen2.5-coder-7B model, outperforms multiple GPT-4-driven text-to-coder systems in comparative evaluations.
arXiv Detail & Related papers (2025-03-06T20:46:43Z) - Hybrid Querying Over Relational Databases and Large Language Models [8.926173054003547]
We present the first cross-domain benchmark, SWAN, containing 120 beyond-Database questions over four real-world databases.
We present two solutions: one based on schema expansion and the other based on user defined functions.
Our evaluation demonstrates that using GPT-4 Turbo with few-shot prompts, one can achieves up to 40.0% in execution accuracy and 48.2% in data factuality.
arXiv Detail & Related papers (2024-08-01T19:29:18Z) - Relational Database Augmented Large Language Model [59.38841050766026]
Large language models (LLMs) excel in many natural language processing (NLP) tasks.
They can only incorporate new knowledge through training or supervised fine-tuning processes.
This precise, up-to-date, and private information is typically stored in relational databases.
arXiv Detail & Related papers (2024-07-21T06:19:10Z) - RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL [48.516004807486745]
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to- task.
We propose RB-, a novel retrieval-based framework for in-context prompt engineering.
Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
arXiv Detail & Related papers (2024-07-11T08:19:58Z) - CHESS: Contextual Harnessing for Efficient SQL Synthesis [1.9506402593665235]
We introduce CHESS, a framework for efficient and scalable text-to- queries.
It comprises four specialized agents, each targeting one of the aforementioned challenges.
Our framework offers features that adapt to various deployment constraints.
arXiv Detail & Related papers (2024-05-27T01:54:16Z) - ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models [46.07900122810749]
Large language models (LLMs) have achieved unprecedented performances in various applications, yet evaluating them is still challenging.
We contend that utilizing existing relational databases is a promising approach for constructing benchmarks.
We propose ERBench, which uses these integrity constraints to convert any database into an LLM benchmark.
arXiv Detail & Related papers (2024-03-08T12:42:36Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - A Scalable Space-efficient In-database Interpretability Framework for
Embedding-based Semantic SQL Queries [3.0938904602244346]
We introduce a new co-occurrence based interpretability approach to capture relationships between relational entities.
Our approach provides both query-agnostic (global) and query-specific (local) interpretabilities.
arXiv Detail & Related papers (2023-02-23T17:18:40Z) - Proton: Probing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric.
Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences.
Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z) - Automated Database Indexing using Model-free Reinforcement Learning [19.64574177805823]
We develop an architecture to solve the problem of automatically indexing a database by using reinforcement learning to optimize queries by indexing data throughout the lifetime of a database.
In our experimental evaluation, our architecture shows superior performance compared to related work on reinforcement learning and genetic algorithms, maintaining near-optimal index configurations and efficiently scaling to large databases.
arXiv Detail & Related papers (2020-07-25T14:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.