Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query
Size
- URL: http://arxiv.org/abs/2112.03458v1
- Date: Tue, 7 Dec 2021 02:46:46 GMT
- Title: Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query
Size
- Authors: Rong Zhu, Tianjing Zeng, Andreas Pfadler, Wei Chen, Bolin Ding,
Jingren Zhou
- Abstract summary: Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans.
The hardest problem in CardEst, i.e. how to estimate the join query size on multiple tables, has not been extensively solved.
In this paper, we propose a very general framework, called Glue, which supports single table-wise CardEst results.
- Score: 35.1093718746362
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cardinality estimation (CardEst), a central component of the query optimizer,
plays a significant role in generating high-quality query plans in DBMS. The
CardEst problem has been extensively studied in the last several decades, using
both traditional and ML-enhanced methods. Whereas, the hardest problem in
CardEst, i.e., how to estimate the join query size on multiple tables, has not
been extensively solved. Current methods either reply on independence
assumptions or apply techniques with heavy burden, whose performance is still
far from satisfactory. Even worse, existing CardEst methods are often designed
to optimize one goal, i.e., inference speed or estimation accuracy, which can
not adapt to different occasions.
In this paper, we propose a very general framework, called Glue, to tackle
with these challenges. Its key idea is to elegantly decouple the correlations
across different tables and losslessly merge single table CardEst results to
estimate the join query size. Glue supports obtaining the single table-wise
CardEst results using any existing CardEst method and can process any complex
join schema. Therefore, it easily adapts to different scenarios having
different performance requirements, i.e., OLTP with fast estimation time or
OLAP with high estimation accuracy. Meanwhile, we show that Glue can be
seamlessly integrated into the plan search process and is able to support
counting distinct number of values. All these properties exhibit the potential
advances of deploying Glue in real-world DBMS.
Related papers
- PRICE: A Pretrained Model for Cross-Database Cardinality Estimation [78.30959470441442]
Cardinality estimation (CardEst) is essential for optimizing query execution plans.
Recent ML-based CardEst methods achieve high accuracy but face deployment challenges due to high preparation costs.
We propose PRICE, a PRetrained multI-table CardEst model, which addresses these limitations.
arXiv Detail & Related papers (2024-06-03T06:21:53Z) - Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval [52.592071689901196]
We introduce a method that uncovers useful join relations for any query and database during table retrieval.
Our method outperforms the state-of-the-art approaches for table retrieval by up to 9.3% in F1 score and for end-to-end QA by up to 5.4% in accuracy.
arXiv Detail & Related papers (2024-04-15T15:55:01Z) - JoinGym: An Efficient Query Optimization Environment for Reinforcement
Learning [58.71541261221863]
Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost.
We present JoinGym, a query optimization environment for bushy reinforcement learning (RL)
Under the hood, JoinGym simulates a query plan's cost by looking up intermediate result cardinalities from a pre-computed dataset.
arXiv Detail & Related papers (2023-07-21T17:00:06Z) - Scardina: Scalable Join Cardinality Estimation by Multiple Density
Estimators [8.641606056228675]
Machine learning-based cardinality estimation methods are replacing traditional methods.
We propose Scardina, a new join cardinality estimation method using multiple partitioned models based on the schema structure.
arXiv Detail & Related papers (2023-03-31T13:22:28Z) - FactorJoin: A New Cardinality Estimation Framework for Join Queries [35.22928513918166]
Cardinality estimation is one of the most fundamental and challenging problems in query optimization.
We propose a new framework FactorJoin for estimating join queries.
In our evaluation, FactorJoin can produce more effective estimates than the previous state-of-the-art learning-based methods.
arXiv Detail & Related papers (2022-12-11T15:51:39Z) - Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation [43.27881697012329]
Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query workload.
In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real dataset.
We establish a new benchmark for CardEst, which contains a new complex real-world STATS and a diverse query STATS-CEB.
arXiv Detail & Related papers (2021-09-13T11:25:02Z) - FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation [45.98791307420517]
We propose FLAT, a CardEst method that is simultaneously fast in probability computation, lightweight in graphical model size and accurate in estimation quality.
FLAT supports efficient online probability computation in near liner time on the underlying FSPN model.
It can estimate cardinality for both single table queries and multi table join queries.
It achieves 1 to 5 orders of magnitude better accuracy, 1 to 3 orders of magnitude faster probability speed and 1 to 2 orders of magnitude lower storage cost.
arXiv Detail & Related papers (2020-11-18T01:14:45Z) - NeuroCard: One Cardinality Estimator for All Tables [23.723132106252272]
NeuroCard is a join cardinality estimator that builds a single neural density estimator over an entire database.
NeuroCard achieves orders of higher magnitude accuracy than the best prior methods.
arXiv Detail & Related papers (2020-06-15T03:21:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.