Related papers: BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation

BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation

URL: http://arxiv.org/abs/2012.14743v2
Date: Tue, 2 Feb 2021 10:54:33 GMT
Title: BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation
Authors: Ziniu Wu, Amir Shaikhha, Rong Zhu, Kai Zeng, Yuxing Han, Jingren Zhou
Abstract summary: A desired CardEst method should attain good algorithm performance, be stable to varied data settings, and be friendly to system deployment. BayesCard is the first framework that inherits the advantages of BNs, i.e., high estimation accuracy and interpretability. It achieves comparable or better accuracy, 1-2 orders of magnitude faster inference time, 1-3 orders faster training time, 1-3 orders smaller model size, and 1-2 orders faster updates.
Score: 25.871965040723772
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cardinality estimation (CardEst) is an essential component in query optimizers and a fundamental problem in DBMS. A desired CardEst method should attain good algorithm performance, be stable to varied data settings, and be friendly to system deployment. However, no existing CardEst method can fulfill the three criteria at the same time. Traditional methods often have significant algorithm drawbacks such as large estimation errors. Recently proposed deep learning based methods largely improve the estimation accuracy but their performance can be greatly affected by data and often difficult for system deployment. In this paper, we revitalize the Bayesian networks (BN) for CardEst by incorporating the techniques of probabilistic programming languages. We present BayesCard, the first framework that inherits the advantages of BNs, i.e., high estimation accuracy and interpretability, while overcomes their drawbacks, i.e. low structure learning and inference efficiency. This makes BayesCard a perfect candidate for commercial DBMS deployment. Our experimental results on several single-table and multi-table benchmarks indicate BayesCard's superiority over existing state-of-the-art CardEst methods: BayesCard achieves comparable or better accuracy, 1-2 orders of magnitude faster inference time, 1-3 orders faster training time, 1-3 orders smaller model size, and 1-2 orders faster updates. Meanwhile, BayesCard keeps stable performance when varying data with different settings. We also deploy BayesCard into PostgreSQL. On the IMDB benchmark workload, it improves the end-to-end query time by 13.3%, which is very close to the optimal result of 14.2% using an oracle of true cardinality.

Related papers

Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection [52.716143424856185]
We propose LiMA (Less input is More faithful for Attribution), which reformulates the attribution of important regions as an optimization problem for submodular subset selection. LiMA identifies both the most and least important samples while ensuring an optimal attribution boundary that minimizes errors. Our method also outperforms the greedy search in attribution efficiency, being 1.6 times faster.
arXiv Detail & Related papers (2025-04-01T06:58:15Z)
CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases [17.46316633654637]
Cardinality estimation is crucial for enabling high query performance in databases. There is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches. We release a benchmark, containing thousands of queries over 20 distinct real-world databases for learned cardinality estimation.
arXiv Detail & Related papers (2024-08-28T23:25:25Z)
PRICE: A Pretrained Model for Cross-Database Cardinality Estimation [78.30959470441442]
Cardinality estimation (CardEst) is essential for optimizing query execution plans. Recent ML-based CardEst methods achieve high accuracy but face deployment challenges due to high preparation costs. We propose PRICE, a PRetrained multI-table CardEst model, which addresses these limitations.
arXiv Detail & Related papers (2024-06-03T06:21:53Z)
Lero: A Learning-to-Rank Query Optimizer [49.841082217997354]
We introduce a learning to rank query, called Lero, which builds on top of the native query and continuously learns to improve query optimization. Rather than building a learned from scratch, Lero is designed to leverage decades of wisdom of databases and improve the native. Lero achieves near optimal performance on several benchmarks.
arXiv Detail & Related papers (2023-02-14T07:31:11Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query Size [35.1093718746362]
Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans. The hardest problem in CardEst, i.e. how to estimate the join query size on multiple tables, has not been extensively solved. In this paper, we propose a very general framework, called Glue, which supports single table-wise CardEst results.
arXiv Detail & Related papers (2021-12-07T02:46:46Z)
Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation [43.27881697012329]
Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query workload. In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real dataset. We establish a new benchmark for CardEst, which contains a new complex real-world STATS and a diverse query STATS-CEB.
arXiv Detail & Related papers (2021-09-13T11:25:02Z)
FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation [45.98791307420517]
We propose FLAT, a CardEst method that is simultaneously fast in probability computation, lightweight in graphical model size and accurate in estimation quality. FLAT supports efficient online probability computation in near liner time on the underlying FSPN model. It can estimate cardinality for both single table queries and multi table join queries. It achieves 1 to 5 orders of magnitude better accuracy, 1 to 3 orders of magnitude faster probability speed and 1 to 2 orders of magnitude lower storage cost.
arXiv Detail & Related papers (2020-11-18T01:14:45Z)
The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs. We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit" We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.