BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation
- URL: http://arxiv.org/abs/2012.14743v2
- Date: Tue, 2 Feb 2021 10:54:33 GMT
- Title: BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation
- Authors: Ziniu Wu, Amir Shaikhha, Rong Zhu, Kai Zeng, Yuxing Han, Jingren Zhou
- Abstract summary: A desired CardEst method should attain good algorithm performance, be stable to varied data settings, and be friendly to system deployment.
BayesCard is the first framework that inherits the advantages of BNs, i.e., high estimation accuracy and interpretability.
It achieves comparable or better accuracy, 1-2 orders of magnitude faster inference time, 1-3 orders faster training time, 1-3 orders smaller model size, and 1-2 orders faster updates.
- Score: 25.871965040723772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cardinality estimation (CardEst) is an essential component in query
optimizers and a fundamental problem in DBMS. A desired CardEst method should
attain good algorithm performance, be stable to varied data settings, and be
friendly to system deployment. However, no existing CardEst method can fulfill
the three criteria at the same time. Traditional methods often have significant
algorithm drawbacks such as large estimation errors. Recently proposed deep
learning based methods largely improve the estimation accuracy but their
performance can be greatly affected by data and often difficult for system
deployment.
In this paper, we revitalize the Bayesian networks (BN) for CardEst by
incorporating the techniques of probabilistic programming languages. We present
BayesCard, the first framework that inherits the advantages of BNs, i.e., high
estimation accuracy and interpretability, while overcomes their drawbacks, i.e.
low structure learning and inference efficiency. This makes BayesCard a perfect
candidate for commercial DBMS deployment. Our experimental results on several
single-table and multi-table benchmarks indicate BayesCard's superiority over
existing state-of-the-art CardEst methods: BayesCard achieves comparable or
better accuracy, 1-2 orders of magnitude faster inference time, 1-3 orders
faster training time, 1-3 orders smaller model size, and 1-2 orders faster
updates. Meanwhile, BayesCard keeps stable performance when varying data with
different settings. We also deploy BayesCard into PostgreSQL. On the IMDB
benchmark workload, it improves the end-to-end query time by 13.3%, which is
very close to the optimal result of 14.2% using an oracle of true cardinality.
Related papers
- CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases [17.46316633654637]
Cardinality estimation is crucial for enabling high query performance in databases.
There is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches.
We release a benchmark, containing thousands of queries over 20 distinct real-world databases for learned cardinality estimation.
arXiv Detail & Related papers (2024-08-28T23:25:25Z) - PRICE: A Pretrained Model for Cross-Database Cardinality Estimation [78.30959470441442]
Cardinality estimation (CardEst) is essential for optimizing query execution plans.
Recent ML-based CardEst methods achieve high accuracy but face deployment challenges due to high preparation costs.
We propose PRICE, a PRetrained multI-table CardEst model, which addresses these limitations.
arXiv Detail & Related papers (2024-06-03T06:21:53Z) - Lero: A Learning-to-Rank Query Optimizer [49.841082217997354]
We introduce a learning to rank query, called Lero, which builds on top of the native query and continuously learns to improve query optimization.
Rather than building a learned from scratch, Lero is designed to leverage decades of wisdom of databases and improve the native.
Lero achieves near optimal performance on several benchmarks.
arXiv Detail & Related papers (2023-02-14T07:31:11Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query
Size [35.1093718746362]
Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans.
The hardest problem in CardEst, i.e. how to estimate the join query size on multiple tables, has not been extensively solved.
In this paper, we propose a very general framework, called Glue, which supports single table-wise CardEst results.
arXiv Detail & Related papers (2021-12-07T02:46:46Z) - Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation [43.27881697012329]
Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query workload.
In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real dataset.
We establish a new benchmark for CardEst, which contains a new complex real-world STATS and a diverse query STATS-CEB.
arXiv Detail & Related papers (2021-09-13T11:25:02Z) - FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation [45.98791307420517]
We propose FLAT, a CardEst method that is simultaneously fast in probability computation, lightweight in graphical model size and accurate in estimation quality.
FLAT supports efficient online probability computation in near liner time on the underlying FSPN model.
It can estimate cardinality for both single table queries and multi table join queries.
It achieves 1 to 5 orders of magnitude better accuracy, 1 to 3 orders of magnitude faster probability speed and 1 to 2 orders of magnitude lower storage cost.
arXiv Detail & Related papers (2020-11-18T01:14:45Z) - The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs.
We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit"
We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.