Scardina: Scalable Join Cardinality Estimation by Multiple Density
Estimators
- URL: http://arxiv.org/abs/2303.18042v1
- Date: Fri, 31 Mar 2023 13:22:28 GMT
- Title: Scardina: Scalable Join Cardinality Estimation by Multiple Density
Estimators
- Authors: Ryuichi Ito, Yuya Sasaki, Chuan Xiao, Makoto Onizuka
- Abstract summary: Machine learning-based cardinality estimation methods are replacing traditional methods.
We propose Scardina, a new join cardinality estimation method using multiple partitioned models based on the schema structure.
- Score: 8.641606056228675
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, machine learning-based cardinality estimation methods are
replacing traditional methods. This change is expected to contribute to one of
the most important applications of cardinality estimation, the query optimizer,
to speed up query processing. However, none of the existing methods do not
precisely estimate cardinalities when relational schemas consist of many tables
with strong correlations between tables/attributes. This paper describes that
multiple density estimators can be combined to effectively target the
cardinality estimation of data with large and complex schemas having strong
correlations. We propose Scardina, a new join cardinality estimation method
using multiple partitioned models based on the schema structure.
Related papers
- CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases [17.46316633654637]
Cardinality estimation is crucial for enabling high query performance in databases.
There is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches.
We release a benchmark, containing thousands of queries over 20 distinct real-world databases for learned cardinality estimation.
arXiv Detail & Related papers (2024-08-28T23:25:25Z) - MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)
MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved.
We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z) - Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions.
We propose deep negative correlation classification (DNCC)
DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z) - FactorJoin: A New Cardinality Estimation Framework for Join Queries [35.22928513918166]
Cardinality estimation is one of the most fundamental and challenging problems in query optimization.
We propose a new framework FactorJoin for estimating join queries.
In our evaluation, FactorJoin can produce more effective estimates than the previous state-of-the-art learning-based methods.
arXiv Detail & Related papers (2022-12-11T15:51:39Z) - Mining Relations among Cross-Frame Affinities for Video Semantic
Segmentation [87.4854250338374]
We explore relations among affinities in two aspects: single-scale intrinsic correlations and multi-scale relations.
Our experiments demonstrate that the proposed method performs favorably against state-of-the-art VSS methods.
arXiv Detail & Related papers (2022-07-21T12:12:36Z) - Proton: Probing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric.
Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences.
Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z) - Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query
Size [35.1093718746362]
Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans.
The hardest problem in CardEst, i.e. how to estimate the join query size on multiple tables, has not been extensively solved.
In this paper, we propose a very general framework, called Glue, which supports single table-wise CardEst results.
arXiv Detail & Related papers (2021-12-07T02:46:46Z) - Density Ratio Estimation via Infinitesimal Classification [85.08255198145304]
We propose DRE-infty, a divide-and-conquer approach to reduce Density ratio estimation (DRE) to a series of easier subproblems.
Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions.
We show that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets.
arXiv Detail & Related papers (2021-11-22T06:26:29Z) - Probabilistic Case-based Reasoning for Open-World Knowledge Graph
Completion [59.549664231655726]
A case-based reasoning (CBR) system solves a new problem by retrieving cases' that are similar to the given problem.
In this paper, we demonstrate that such a system is achievable for reasoning in knowledge-bases (KBs)
Our approach predicts attributes for an entity by gathering reasoning paths from similar entities in the KB.
arXiv Detail & Related papers (2020-10-07T17:48:12Z) - NeuroCard: One Cardinality Estimator for All Tables [23.723132106252272]
NeuroCard is a join cardinality estimator that builds a single neural density estimator over an entire database.
NeuroCard achieves orders of higher magnitude accuracy than the best prior methods.
arXiv Detail & Related papers (2020-06-15T03:21:46Z) - NN-based Transformation of Any SQL Cardinality Estimator for Handling
DISTINCT, AND, OR and NOT [1.8275108630751837]
A query planner requires the set-theoretic cardinality (i.e., without duplicates) for queries with DISTINCT as well as in planning.
Many cardinality estimation methods are limited to estimating cardinalities of only conjunctive queries with duplicates counted.
We describe two methods for handling this deficiency that can be applied to any limited cardinality estimation model.
arXiv Detail & Related papers (2020-04-15T11:20:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.