Are We Ready For Learned Cardinality Estimation?
- URL: http://arxiv.org/abs/2012.06743v3
- Date: Mon, 15 Mar 2021 23:35:27 GMT
- Title: Are We Ready For Learned Cardinality Estimation?
- Authors: Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, Qingqing Zhou
- Abstract summary: We show that learned models are indeed more accurate than traditional methods, but they often suffer from high training and inference costs.
Secondly, we explore whether these learned models are ready for dynamic environments (i.e., frequent data updates)
Our results show that the performance of learned methods can be greatly affected by the changes in correlation, skewness, or domain size.
- Score: 6.703418426908341
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cardinality estimation is a fundamental but long unresolved problem in query
optimization. Recently, multiple papers from different research groups
consistently report that learned models have the potential to replace existing
cardinality estimators. In this paper, we ask a forward-thinking question: Are
we ready to deploy these learned cardinality models in production? Our study
consists of three main parts. Firstly, we focus on the static environment
(i.e., no data updates) and compare five new learned methods with eight
traditional methods on four real-world datasets under a unified workload
setting. The results show that learned models are indeed more accurate than
traditional methods, but they often suffer from high training and inference
costs. Secondly, we explore whether these learned models are ready for dynamic
environments (i.e., frequent data updates). We find that they cannot catch up
with fast data up-dates and return large errors for different reasons. For less
frequent updates, they can perform better but there is no clear winner among
themselves. Thirdly, we take a deeper look into learned models and explore when
they may go wrong. Our results show that the performance of learned methods can
be greatly affected by the changes in correlation, skewness, or domain size.
More importantly, their behaviors are much harder to interpret and often
unpredictable. Based on these findings, we identify two promising research
directions (control the cost of learned models and make learned models
trustworthy) and suggest a number of research opportunities. We hope that our
study can guide researchers and practitioners to work together to eventually
push learned cardinality estimators into real database systems.
Related papers
- CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases [17.46316633654637]
Cardinality estimation is crucial for enabling high query performance in databases.
There is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches.
We release a benchmark, containing thousands of queries over 20 distinct real-world databases for learned cardinality estimation.
arXiv Detail & Related papers (2024-08-28T23:25:25Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z) - Revisiting Contrastive Methods for Unsupervised Learning of Visual
Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
In this paper, we first study how biases in the dataset affect existing methods.
We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z) - Distill on the Go: Online knowledge distillation in self-supervised
learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models.
We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation.
Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z) - Dataset Cartography: Mapping and Diagnosing Datasets with Training
Dynamics [118.75207687144817]
We introduce Data Maps, a model-based tool to characterize and diagnose datasets.
We leverage a largely ignored source of information: the behavior of the model on individual instances during training.
Our results indicate that a shift in focus from quantity to quality of data could lead to robust models and improved out-of-distribution generalization.
arXiv Detail & Related papers (2020-09-22T20:19:41Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Bayesian Meta-Prior Learning Using Empirical Bayes [3.666114237131823]
We propose a hierarchical Empirical Bayes approach that addresses the absence of informative priors, and the inability to control parameter learning rates.
Our method learns empirical meta-priors from the data itself and uses them to decouple the learning rates of first-order and second-order features.
Our findings are promising, as optimizing over sparse data is often a challenge.
arXiv Detail & Related papers (2020-02-04T05:08:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.