Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation
- URL: http://arxiv.org/abs/2109.05877v3
- Date: Wed, 15 Sep 2021 05:02:16 GMT
- Title: Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation
- Authors: Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan,
Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, Zhengping Qian, Jingren
Zhou, Jiangneng Li, Bin Cui
- Abstract summary: Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query workload.
In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real dataset.
We establish a new benchmark for CardEst, which contains a new complex real-world STATS and a diverse query STATS-CEB.
- Score: 43.27881697012329
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cardinality estimation (CardEst) plays a significant role in generating
high-quality query plans for a query optimizer in DBMS. In the last decade, an
increasing number of advanced CardEst methods (especially ML-based) have been
proposed with outstanding estimation accuracy and inference latency. However,
there exists no study that systematically evaluates the quality of these
methods and answer the fundamental problem: to what extent can these methods
improve the performance of query optimizer in real-world settings, which is the
ultimate goal of a CardEst method. In this paper, we comprehensively and
systematically compare the effectiveness of CardEst methods in a real DBMS. We
establish a new benchmark for CardEst, which contains a new complex real-world
dataset STATS and a diverse query workload STATS-CEB. We integrate multiple
most representative CardEst methods into an open-source database system
PostgreSQL, and comprehensively evaluate their true effectiveness in improving
query plan quality, and other important aspects affecting their applicability,
ranging from inference latency, model size, and training time, to update
efficiency and accuracy. We obtain a number of key findings for the CardEst
methods, under different data and query settings. Furthermore, we find that the
widely used estimation accuracy metric(Q-Error) cannot distinguish the
importance of different sub-plan queries during query optimization and thus
cannot truly reflect the query plan quality generated by CardEst methods.
Therefore, we propose a new metric P-Error to evaluate the performance of
CardEst methods, which overcomes the limitation of Q-Error and is able to
reflect the overall end-to-end performance of CardEst methods. We have made all
of the benchmark data and evaluation code publicly available at
https://github.com/Nathaniel-Han/End-to-End-CardEst-Benchmark.
Related papers
- Revisiting BPR: A Replicability Study of a Common Recommender System Baseline [78.00363373925758]
We study the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations.
Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations.
We show that the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets.
arXiv Detail & Related papers (2024-09-21T18:39:53Z) - CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases [17.46316633654637]
Cardinality estimation is crucial for enabling high query performance in databases.
There is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches.
We release a benchmark, containing thousands of queries over 20 distinct real-world databases for learned cardinality estimation.
arXiv Detail & Related papers (2024-08-28T23:25:25Z) - On Speeding Up Language Model Evaluation [48.51924035873411]
Development of prompt-based methods with Large Language Models (LLMs) requires making numerous decisions.
We propose a novel method to address this challenge.
We show that it can identify the top-performing method using only 5-15% of the typically needed resources.
arXiv Detail & Related papers (2024-07-08T17:48:42Z) - PRICE: A Pretrained Model for Cross-Database Cardinality Estimation [78.30959470441442]
Cardinality estimation (CardEst) is essential for optimizing query execution plans.
Recent ML-based CardEst methods achieve high accuracy but face deployment challenges due to high preparation costs.
We propose PRICE, a PRetrained multI-table CardEst model, which addresses these limitations.
arXiv Detail & Related papers (2024-06-03T06:21:53Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query
Size [35.1093718746362]
Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans.
The hardest problem in CardEst, i.e. how to estimate the join query size on multiple tables, has not been extensively solved.
In this paper, we propose a very general framework, called Glue, which supports single table-wise CardEst results.
arXiv Detail & Related papers (2021-12-07T02:46:46Z) - BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation [25.871965040723772]
A desired CardEst method should attain good algorithm performance, be stable to varied data settings, and be friendly to system deployment.
BayesCard is the first framework that inherits the advantages of BNs, i.e., high estimation accuracy and interpretability.
It achieves comparable or better accuracy, 1-2 orders of magnitude faster inference time, 1-3 orders faster training time, 1-3 orders smaller model size, and 1-2 orders faster updates.
arXiv Detail & Related papers (2020-12-29T13:21:18Z) - FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation [45.98791307420517]
We propose FLAT, a CardEst method that is simultaneously fast in probability computation, lightweight in graphical model size and accurate in estimation quality.
FLAT supports efficient online probability computation in near liner time on the underlying FSPN model.
It can estimate cardinality for both single table queries and multi table join queries.
It achieves 1 to 5 orders of magnitude better accuracy, 1 to 3 orders of magnitude faster probability speed and 1 to 2 orders of magnitude lower storage cost.
arXiv Detail & Related papers (2020-11-18T01:14:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.