A Lightweight Learned Cardinality Estimation Model
- URL: http://arxiv.org/abs/2508.09602v1
- Date: Wed, 13 Aug 2025 08:34:58 GMT
- Title: A Lightweight Learned Cardinality Estimation Model
- Authors: Yaoyu Zhu, Jintao Zhang, Guoliang Li, Jianhua Feng,
- Abstract summary: We propose a novel data-driven approach called CoDe (Covering with Decompositions) to address the cardinality estimation problem.<n>CoDe employs the concept of covering design, which divides the table into multiple smaller, overlapping segments.<n>By employing multiple models to approximate distributions, CoDe excels in effectively modeling discrete distributions and ensuring computational efficiency.
- Score: 12.33987311866824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cardinality estimation is a fundamental task in database management systems, aiming to predict query results accurately without executing the queries. However, existing techniques either achieve low estimation accuracy or incur high inference latency. Simultaneously achieving high speed and accuracy becomes critical for the cardinality estimation problem. In this paper, we propose a novel data-driven approach called CoDe (Covering with Decompositions) to address this problem. CoDe employs the concept of covering design, which divides the table into multiple smaller, overlapping segments. For each segment, CoDe utilizes tensor decomposition to accurately model its data distribution. Moreover, CoDe introduces innovative algorithms to select the best-fitting distributions for each query, combining them to estimate the final result. By employing multiple models to approximate distributions, CoDe excels in effectively modeling discrete distributions and ensuring computational efficiency. Notably, experimental results show that our method represents a significant advancement in cardinality estimation, achieving state-of-the-art levels of both estimation accuracy and inference efficiency. Across various datasets, CoDe achieves absolute accuracy in estimating more than half of the queries.
Related papers
- CoLSE: A Lightweight and Robust Hybrid Learned Model for Single-Table Cardinality Estimation using Joint CDF [7.945011337356916]
Cardinality estimation is a critical component of query optimization.<n>We propose CoLSE, a hybrid learned approach for single-table cardinality estimation.<n> Experimental results show that CoLSE achieves a favorable trade-off among accuracy, training time, latency, and model size, outperforming existing state-of-the-art methods.
arXiv Detail & Related papers (2025-12-14T10:08:20Z) - Prompt-Matcher: Leveraging Large Models to Reduce Uncertainty in Schema Matching Results [1.13107643869251]
We introduce a new approach based on fine-grained correspondence verification with specific prompt of Large Language Model.<n>Our approach is an iterative loop that consists of three main components: (1) the correspondence selection algorithm, (2) correspondence verification, and (3) the update of probability distribution.<n>We propose a novel $(1-1/e)$-approximation algorithm that significantly outperforms brute algorithm in terms of computational efficiency.
arXiv Detail & Related papers (2024-08-24T16:54:08Z) - Duet: efficient and scalable hybriD neUral rElation undersTanding [9.231883521214241]
Duet is a stable, efficient, and scalable hybrid method to estimate cardinality directly without sampling or any non-differentiable process.
Duet has a lower inference cost on CPU than that of most learned methods on GPU.
arXiv Detail & Related papers (2023-07-25T13:42:22Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Correlated quantization for distributed mean estimation and optimization [21.17434087570296]
We propose a correlated quantization protocol whose error guarantee depends on the deviation of data points instead of their absolute range.
We show that applying the proposed protocol as sub-routine in distributed optimization algorithms leads to better convergence rates.
arXiv Detail & Related papers (2022-03-09T18:14:55Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints.
We develop novel and simple optimization formulations for these problems.
As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z) - Efficient Ensemble Model Generation for Uncertainty Estimation with
Bayesian Approximation in Segmentation [74.06904875527556]
We propose a generic and efficient segmentation framework to construct ensemble segmentation models.
In the proposed method, ensemble models can be efficiently generated by using the layer selection method.
We also devise a new pixel-wise uncertainty loss, which improves the predictive performance.
arXiv Detail & Related papers (2020-05-21T16:08:38Z) - Scalable Approximate Inference and Some Applications [2.6541211006790983]
In this thesis, we propose a new framework for approximate inference.
Our proposed four algorithms are motivated by the recent computational progress of Stein's method.
Results on simulated and real datasets indicate the statistical efficiency and wide applicability of our algorithm.
arXiv Detail & Related papers (2020-03-07T04:33:27Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z) - A General Method for Robust Learning from Batches [56.59844655107251]
We consider a general framework of robust learning from batches, and determine the limits of both classification and distribution estimation over arbitrary, including continuous, domains.
We derive the first robust computationally-efficient learning algorithms for piecewise-interval classification, and for piecewise-polynomial, monotone, log-concave, and gaussian-mixture distribution estimation.
arXiv Detail & Related papers (2020-02-25T18:53:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.