Related papers: FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation

FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation

URL: http://arxiv.org/abs/2011.09022v5
Date: Wed, 19 May 2021 07:37:36 GMT
Title: FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation
Authors: Rong Zhu, Ziniu Wu, Yuxing Han, Kai Zeng, Andreas Pfadler, Zhengping Qian, Jingren Zhou, Bin Cui
Abstract summary: We propose FLAT, a CardEst method that is simultaneously fast in probability computation, lightweight in graphical model size and accurate in estimation quality. FLAT supports efficient online probability computation in near liner time on the underlying FSPN model. It can estimate cardinality for both single table queries and multi table join queries. It achieves 1 to 5 orders of magnitude better accuracy, 1 to 3 orders of magnitude faster probability speed and 1 to 2 orders of magnitude lower storage cost.
Score: 45.98791307420517
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Query optimizers rely on accurate cardinality estimation (CardEst) to produce good execution plans. The core problem of CardEst is how to model the rich joint distribution of attributes in an accurate and compact manner. Despite decades of research, existing methods either over simplify the models only using independent factorization which leads to inaccurate estimates, or over complicate them by lossless conditional factorization without any independent assumption which results in slow probability computation. In this paper, we propose FLAT, a CardEst method that is simultaneously fast in probability computation, lightweight in model size and accurate in estimation quality. The key idea of FLAT is a novel unsupervised graphical model, called FSPN. It utilizes both independent and conditional factorization to adaptively model different levels of attributes correlations, and thus dovetails their advantages. FLAT supports efficient online probability computation in near liner time on the underlying FSPN model, provides effective offline model construction and enables incremental model updates. It can estimate cardinality for both single table queries and multi table join queries. Extensive experimental study demonstrates the superiority of FLAT over existing CardEst methods on well known IMDB benchmarks: FLAT achieves 1 to 5 orders of magnitude better accuracy, 1 to 3 orders of magnitude faster probability computation speed and 1 to 2 orders of magnitude lower storage cost. We also integrate FLAT into Postgres to perform an end to end test. It improves the query execution time by 12.9% on the benchmark workload, which is very close to the optimal result 14.2% using the true cardinality.

Related papers

SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z)
CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases [17.46316633654637]
Cardinality estimation is crucial for enabling high query performance in databases. There is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches. We release a benchmark, containing thousands of queries over 20 distinct real-world databases for learned cardinality estimation.
arXiv Detail & Related papers (2024-08-28T23:25:25Z)
PRICE: A Pretrained Model for Cross-Database Cardinality Estimation [78.30959470441442]
Cardinality estimation (CardEst) is essential for optimizing query execution plans. Recent ML-based CardEst methods achieve high accuracy but face deployment challenges due to high preparation costs. We propose PRICE, a PRetrained multI-table CardEst model, which addresses these limitations.
arXiv Detail & Related papers (2024-06-03T06:21:53Z)
AutoFT: Learning an Objective for Robust Fine-Tuning [60.641186718253735]
Foundation models encode rich representations that can be adapted to downstream tasks by fine-tuning. Current approaches to robust fine-tuning use hand-crafted regularization techniques. We propose AutoFT, a data-driven approach for robust fine-tuning.
arXiv Detail & Related papers (2024-01-18T18:58:49Z)
FactorJoin: A New Cardinality Estimation Framework for Join Queries [35.22928513918166]
Cardinality estimation is one of the most fundamental and challenging problems in query optimization. We propose a new framework FactorJoin for estimating join queries. In our evaluation, FactorJoin can produce more effective estimates than the previous state-of-the-art learning-based methods.
arXiv Detail & Related papers (2022-12-11T15:51:39Z)
Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query Size [35.1093718746362]
Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans. The hardest problem in CardEst, i.e. how to estimate the join query size on multiple tables, has not been extensively solved. In this paper, we propose a very general framework, called Glue, which supports single table-wise CardEst results.
arXiv Detail & Related papers (2021-12-07T02:46:46Z)
IRLI: Iterative Re-partitioning for Learning to Index [104.72641345738425]
Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings. We propose a novel approach called IRLI, which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data. We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing.
arXiv Detail & Related papers (2021-03-17T23:13:25Z)
BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation [25.871965040723772]
A desired CardEst method should attain good algorithm performance, be stable to varied data settings, and be friendly to system deployment. BayesCard is the first framework that inherits the advantages of BNs, i.e., high estimation accuracy and interpretability. It achieves comparable or better accuracy, 1-2 orders of magnitude faster inference time, 1-3 orders faster training time, 1-3 orders smaller model size, and 1-2 orders faster updates.
arXiv Detail & Related papers (2020-12-29T13:21:18Z)
Probabilistic Case-based Reasoning for Open-World Knowledge Graph Completion [59.549664231655726]
A case-based reasoning (CBR) system solves a new problem by retrieving cases' that are similar to the given problem. In this paper, we demonstrate that such a system is achievable for reasoning in knowledge-bases (KBs) Our approach predicts attributes for an entity by gathering reasoning paths from similar entities in the KB.
arXiv Detail & Related papers (2020-10-07T17:48:12Z)
NeuroCard: One Cardinality Estimator for All Tables [23.723132106252272]
NeuroCard is a join cardinality estimator that builds a single neural density estimator over an entire database. NeuroCard achieves orders of higher magnitude accuracy than the best prior methods.
arXiv Detail & Related papers (2020-06-15T03:21:46Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.