Related papers: Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation

Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation

URL: http://arxiv.org/abs/2109.05877v3
Date: Wed, 15 Sep 2021 05:02:16 GMT
Title: Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation
Authors: Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, Zhengping Qian, Jingren Zhou, Jiangneng Li, Bin Cui
Abstract summary: Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query workload. In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real dataset. We establish a new benchmark for CardEst, which contains a new complex real-world STATS and a diverse query STATS-CEB.
Score: 43.27881697012329
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query optimizer in DBMS. In the last decade, an increasing number of advanced CardEst methods (especially ML-based) have been proposed with outstanding estimation accuracy and inference latency. However, there exists no study that systematically evaluates the quality of these methods and answer the fundamental problem: to what extent can these methods improve the performance of query optimizer in real-world settings, which is the ultimate goal of a CardEst method. In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real DBMS. We establish a new benchmark for CardEst, which contains a new complex real-world dataset STATS and a diverse query workload STATS-CEB. We integrate multiple most representative CardEst methods into an open-source database system PostgreSQL, and comprehensively evaluate their true effectiveness in improving query plan quality, and other important aspects affecting their applicability, ranging from inference latency, model size, and training time, to update efficiency and accuracy. We obtain a number of key findings for the CardEst methods, under different data and query settings. Furthermore, we find that the widely used estimation accuracy metric(Q-Error) cannot distinguish the importance of different sub-plan queries during query optimization and thus cannot truly reflect the query plan quality generated by CardEst methods. Therefore, we propose a new metric P-Error to evaluate the performance of CardEst methods, which overcomes the limitation of Q-Error and is able to reflect the overall end-to-end performance of CardEst methods. We have made all of the benchmark data and evaluation code publicly available at https://github.com/Nathaniel-Han/End-to-End-CardEst-Benchmark.

Related papers

Revisiting BPR: A Replicability Study of a Common Recommender System Baseline [78.00363373925758]
We study the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations. Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations. We show that the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets.
arXiv Detail & Related papers (2024-09-21T18:39:53Z)
CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases [17.46316633654637]
Cardinality estimation is crucial for enabling high query performance in databases. There is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches. We release a benchmark, containing thousands of queries over 20 distinct real-world databases for learned cardinality estimation.
arXiv Detail & Related papers (2024-08-28T23:25:25Z)
On Speeding Up Language Model Evaluation [48.51924035873411]
Development of prompt-based methods with Large Language Models (LLMs) requires making numerous decisions. We propose a novel method to address this challenge. We show that it can identify the top-performing method using only 5-15% of the typically needed resources.
arXiv Detail & Related papers (2024-07-08T17:48:42Z)
PRICE: A Pretrained Model for Cross-Database Cardinality Estimation [78.30959470441442]
Cardinality estimation (CardEst) is essential for optimizing query execution plans. Recent ML-based CardEst methods achieve high accuracy but face deployment challenges due to high preparation costs. We propose PRICE, a PRetrained multI-table CardEst model, which addresses these limitations.
arXiv Detail & Related papers (2024-06-03T06:21:53Z)
Lero: A Learning-to-Rank Query Optimizer [49.841082217997354]
We introduce a learning to rank query, called Lero, which builds on top of the native query and continuously learns to improve query optimization. Rather than building a learned from scratch, Lero is designed to leverage decades of wisdom of databases and improve the native. Lero achieves near optimal performance on several benchmarks.
arXiv Detail & Related papers (2023-02-14T07:31:11Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query Size [35.1093718746362]
Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans. The hardest problem in CardEst, i.e. how to estimate the join query size on multiple tables, has not been extensively solved. In this paper, we propose a very general framework, called Glue, which supports single table-wise CardEst results.
arXiv Detail & Related papers (2021-12-07T02:46:46Z)
BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation [25.871965040723772]
A desired CardEst method should attain good algorithm performance, be stable to varied data settings, and be friendly to system deployment. BayesCard is the first framework that inherits the advantages of BNs, i.e., high estimation accuracy and interpretability. It achieves comparable or better accuracy, 1-2 orders of magnitude faster inference time, 1-3 orders faster training time, 1-3 orders smaller model size, and 1-2 orders faster updates.
arXiv Detail & Related papers (2020-12-29T13:21:18Z)
FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation [45.98791307420517]
We propose FLAT, a CardEst method that is simultaneously fast in probability computation, lightweight in graphical model size and accurate in estimation quality. FLAT supports efficient online probability computation in near liner time on the underlying FSPN model. It can estimate cardinality for both single table queries and multi table join queries. It achieves 1 to 5 orders of magnitude better accuracy, 1 to 3 orders of magnitude faster probability speed and 1 to 2 orders of magnitude lower storage cost.
arXiv Detail & Related papers (2020-11-18T01:14:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.