NeuroCard: One Cardinality Estimator for All Tables
- URL: http://arxiv.org/abs/2006.08109v2
- Date: Mon, 2 Nov 2020 19:15:11 GMT
- Title: NeuroCard: One Cardinality Estimator for All Tables
- Authors: Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi
Chen, and Ion Stoica
- Abstract summary: NeuroCard is a join cardinality estimator that builds a single neural density estimator over an entire database.
NeuroCard achieves orders of higher magnitude accuracy than the best prior methods.
- Score: 23.723132106252272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Query optimizers rely on accurate cardinality estimates to produce good
execution plans. Despite decades of research, existing cardinality estimators
are inaccurate for complex queries, due to making lossy modeling assumptions
and not capturing inter-table correlations. In this work, we show that it is
possible to learn the correlations across all tables in a database without any
independence assumptions. We present NeuroCard, a join cardinality estimator
that builds a single neural density estimator over an entire database.
Leveraging join sampling and modern deep autoregressive models, NeuroCard makes
no inter-table or inter-column independence assumptions in its probabilistic
modeling. NeuroCard achieves orders of magnitude higher accuracy than the best
prior methods (a new state-of-the-art result of 8.5$\times$ maximum error on
JOB-light), scales to dozens of tables, while being compact in space (several
MBs) and efficient to construct or update (seconds to minutes).
Related papers
- CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases [17.46316633654637]
Cardinality estimation is crucial for enabling high query performance in databases.
There is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches.
We release a benchmark, containing thousands of queries over 20 distinct real-world databases for learned cardinality estimation.
arXiv Detail & Related papers (2024-08-28T23:25:25Z) - PRICE: A Pretrained Model for Cross-Database Cardinality Estimation [78.30959470441442]
Cardinality estimation (CardEst) is essential for optimizing query execution plans.
Recent ML-based CardEst methods achieve high accuracy but face deployment challenges due to high preparation costs.
We propose PRICE, a PRetrained multI-table CardEst model, which addresses these limitations.
arXiv Detail & Related papers (2024-06-03T06:21:53Z) - Scardina: Scalable Join Cardinality Estimation by Multiple Density
Estimators [8.641606056228675]
Machine learning-based cardinality estimation methods are replacing traditional methods.
We propose Scardina, a new join cardinality estimation method using multiple partitioned models based on the schema structure.
arXiv Detail & Related papers (2023-03-31T13:22:28Z) - FactorJoin: A New Cardinality Estimation Framework for Join Queries [35.22928513918166]
Cardinality estimation is one of the most fundamental and challenging problems in query optimization.
We propose a new framework FactorJoin for estimating join queries.
In our evaluation, FactorJoin can produce more effective estimates than the previous state-of-the-art learning-based methods.
arXiv Detail & Related papers (2022-12-11T15:51:39Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query
Size [35.1093718746362]
Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans.
The hardest problem in CardEst, i.e. how to estimate the join query size on multiple tables, has not been extensively solved.
In this paper, we propose a very general framework, called Glue, which supports single table-wise CardEst results.
arXiv Detail & Related papers (2021-12-07T02:46:46Z) - FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation [45.98791307420517]
We propose FLAT, a CardEst method that is simultaneously fast in probability computation, lightweight in graphical model size and accurate in estimation quality.
FLAT supports efficient online probability computation in near liner time on the underlying FSPN model.
It can estimate cardinality for both single table queries and multi table join queries.
It achieves 1 to 5 orders of magnitude better accuracy, 1 to 3 orders of magnitude faster probability speed and 1 to 2 orders of magnitude lower storage cost.
arXiv Detail & Related papers (2020-11-18T01:14:45Z) - Probabilistic Case-based Reasoning for Open-World Knowledge Graph
Completion [59.549664231655726]
A case-based reasoning (CBR) system solves a new problem by retrieving cases' that are similar to the given problem.
In this paper, we demonstrate that such a system is achievable for reasoning in knowledge-bases (KBs)
Our approach predicts attributes for an entity by gathering reasoning paths from similar entities in the KB.
arXiv Detail & Related papers (2020-10-07T17:48:12Z) - Nonparametric Estimation in the Dynamic Bradley-Terry Model [69.70604365861121]
We develop a novel estimator that relies on kernel smoothing to pre-process the pairwise comparisons over time.
We derive time-varying oracle bounds for both the estimation error and the excess risk in the model-agnostic setting.
arXiv Detail & Related papers (2020-02-28T21:52:49Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.