FactorJoin: A New Cardinality Estimation Framework for Join Queries
- URL: http://arxiv.org/abs/2212.05526v1
- Date: Sun, 11 Dec 2022 15:51:39 GMT
- Title: FactorJoin: A New Cardinality Estimation Framework for Join Queries
- Authors: Ziniu Wu, Parimarjan Negi, Mohammad Alizadeh, Tim Kraska, Samuel
Madden
- Abstract summary: Cardinality estimation is one of the most fundamental and challenging problems in query optimization.
We propose a new framework FactorJoin for estimating join queries.
In our evaluation, FactorJoin can produce more effective estimates than the previous state-of-the-art learning-based methods.
- Score: 35.22928513918166
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cardinality estimation is one of the most fundamental and challenging
problems in query optimization. Neither classical nor learning-based methods
yield satisfactory performance when estimating the cardinality of the join
queries. They either rely on simplified assumptions leading to ineffective
cardinality estimates or build large models to understand the data
distributions, leading to long planning times and a lack of generalizability
across queries.
In this paper, we propose a new framework FactorJoin for estimating join
queries. FactorJoin combines the idea behind the classical join-histogram
method to efficiently handle joins with the learning-based methods to
accurately capture attribute correlation. Specifically, FactorJoin scans every
table in a DB and builds single-table conditional distributions during an
offline preparation phase. When a join query comes, FactorJoin translates it
into a factor graph model over the learned distributions to effectively and
efficiently estimate its cardinality.
Unlike existing learning-based methods, FactorJoin does not need to
de-normalize joins upfront or require executed query workloads to train the
model. Since it only relies on single-table statistics, FactorJoin has small
space overhead and is extremely easy to train and maintain. In our evaluation,
FactorJoin can produce more effective estimates than the previous
state-of-the-art learning-based methods, with 40x less estimation latency, 100x
smaller model size, and 100x faster training speed at comparable or better
accuracy. In addition, FactorJoin can estimate 10,000 sub-plan queries within
one second to optimize the query plan, which is very close to the traditional
cardinality estimators in commercial DBMS.
Related papers
- GenJoin: Conditional Generative Plan-to-Plan Query Optimizer that Learns from Subplan Hints [1.3108652488669732]
We present GenJoin, a novel learned query that considers the query optimization problem as a symbiotic generative task.
GenJoin is the first learned query that significantly and consistently outperforms as well as state-of-the-art methods on two well-known real-world benchmarks.
arXiv Detail & Related papers (2024-11-07T08:31:01Z) - CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases [17.46316633654637]
Cardinality estimation is crucial for enabling high query performance in databases.
There is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches.
We release a benchmark, containing thousands of queries over 20 distinct real-world databases for learned cardinality estimation.
arXiv Detail & Related papers (2024-08-28T23:25:25Z) - JoinGym: An Efficient Query Optimization Environment for Reinforcement
Learning [58.71541261221863]
Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost.
We present JoinGym, a query optimization environment for bushy reinforcement learning (RL)
Under the hood, JoinGym simulates a query plan's cost by looking up intermediate result cardinalities from a pre-computed dataset.
arXiv Detail & Related papers (2023-07-21T17:00:06Z) - Scardina: Scalable Join Cardinality Estimation by Multiple Density
Estimators [8.641606056228675]
Machine learning-based cardinality estimation methods are replacing traditional methods.
We propose Scardina, a new join cardinality estimation method using multiple partitioned models based on the schema structure.
arXiv Detail & Related papers (2023-03-31T13:22:28Z) - Lero: A Learning-to-Rank Query Optimizer [49.841082217997354]
We introduce a learning to rank query, called Lero, which builds on top of the native query and continuously learns to improve query optimization.
Rather than building a learned from scratch, Lero is designed to leverage decades of wisdom of databases and improve the native.
Lero achieves near optimal performance on several benchmarks.
arXiv Detail & Related papers (2023-02-14T07:31:11Z) - DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models.
We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn.
Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z) - Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query
Size [35.1093718746362]
Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans.
The hardest problem in CardEst, i.e. how to estimate the join query size on multiple tables, has not been extensively solved.
In this paper, we propose a very general framework, called Glue, which supports single table-wise CardEst results.
arXiv Detail & Related papers (2021-12-07T02:46:46Z) - Robust Generalization and Safe Query-Specialization in Counterfactual
Learning to Rank [62.28965622396868]
We introduce the Generalization and generalization (GENSPEC) algorithm, a robust feature-based counterfactual Learning to Rank method.
Our results show that GENSPEC leads to optimal performance on queries with sufficient click data, while having robust behavior on queries with little or noisy data.
arXiv Detail & Related papers (2021-02-11T13:17:26Z) - FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation [45.98791307420517]
We propose FLAT, a CardEst method that is simultaneously fast in probability computation, lightweight in graphical model size and accurate in estimation quality.
FLAT supports efficient online probability computation in near liner time on the underlying FSPN model.
It can estimate cardinality for both single table queries and multi table join queries.
It achieves 1 to 5 orders of magnitude better accuracy, 1 to 3 orders of magnitude faster probability speed and 1 to 2 orders of magnitude lower storage cost.
arXiv Detail & Related papers (2020-11-18T01:14:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.