NN-based Transformation of Any SQL Cardinality Estimator for Handling
DISTINCT, AND, OR and NOT
- URL: http://arxiv.org/abs/2004.07009v1
- Date: Wed, 15 Apr 2020 11:20:06 GMT
- Title: NN-based Transformation of Any SQL Cardinality Estimator for Handling
DISTINCT, AND, OR and NOT
- Authors: Rojeh Hayek, Oded Shmueli
- Abstract summary: A query planner requires the set-theoretic cardinality (i.e., without duplicates) for queries with DISTINCT as well as in planning.
Many cardinality estimation methods are limited to estimating cardinalities of only conjunctive queries with duplicates counted.
We describe two methods for handling this deficiency that can be applied to any limited cardinality estimation model.
- Score: 1.8275108630751837
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: SQL queries, with the AND, OR, and NOT operators, constitute a broad class of
highly used queries. Thus, their cardinality estimation is important for query
optimization. In addition, a query planner requires the set-theoretic
cardinality (i.e., without duplicates) for queries with DISTINCT as well as in
planning; for example, when considering sorting options. Yet, despite the
importance of estimating query cardinalities in the presence of DISTINCT, AND,
OR, and NOT, many cardinality estimation methods are limited to estimating
cardinalities of only conjunctive queries with duplicates counted.
The focus of this work is on two methods for handling this deficiency that
can be applied to any limited cardinality estimation model. First, we describe
a specialized deep learning scheme, PUNQ, which is tailored to representing
conjunctive SQL queries and predicting the percentage of unique rows in the
query's result with duplicate rows. Using the predicted percentages obtained
via PUNQ, we are able to transform any cardinality estimation method that only
estimates for conjunctive queries, and which estimates cardinalities with
duplicates (e.g., MSCN), to a method that estimates queries cardinalities
without duplicates. This enables estimating cardinalities of queries with the
DISTINCT keyword. In addition, we describe a recursive algorithm, GenCrd, for
extending any cardinality estimation method M that only handles conjunctive
queries to one that estimates cardinalities for more general queries (that
include AND, OR, and NOT), without changing the method M itself.
Our evaluation is carried out on a challenging, real-world database with
general queries that include either the DISTINCT keyword or the AND, OR, and
NOT operators. Experimentally, we show that the proposed methods obtain
accurate cardinality estimates with the same level of accuracy as that of the
original transformed methods.
Related papers
- pEBR: A Probabilistic Approach to Embedding Based Retrieval [4.8338111302871525]
Embedding retrieval aims to learn a shared semantic representation space for both queries and items.
In current industrial practice, retrieval systems typically retrieve a fixed number of items for different queries.
arXiv Detail & Related papers (2024-10-25T07:14:12Z) - CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases [17.46316633654637]
Cardinality estimation is crucial for enabling high query performance in databases.
There is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches.
We release a benchmark, containing thousands of queries over 20 distinct real-world databases for learned cardinality estimation.
arXiv Detail & Related papers (2024-08-28T23:25:25Z) - Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders [77.84801537608651]
Cross-encoder (CE) models which compute similarity by jointly encoding a query-item pair perform better than embedding-based models (dual-encoders) at estimating query-item relevance.
We propose a sparse-matrix factorization based method that efficiently computes latent query and item embeddings to approximate CE scores and performs k-NN search with the approximate CE similarity.
arXiv Detail & Related papers (2024-05-06T17:14:34Z) - Query Performance Prediction using Relevance Judgments Generated by Large Language Models [53.97064615557883]
We propose a QPP framework using automatically generated relevance judgments (QPP-GenRE)
QPP-GenRE decomposes QPP into independent subtasks of predicting relevance of each item in a ranked list to a given query.
This allows us to predict any IR evaluation measure using the generated relevance judgments as pseudo-labels.
arXiv Detail & Related papers (2024-04-01T09:33:05Z) - Regularization-Based Methods for Ordinal Quantification [49.606912965922504]
We study the ordinal case, i.e., the case in which a total order is defined on the set of n>2 classes.
We propose a novel class of regularized OQ algorithms, which outperforms existing algorithms in our experiments.
arXiv Detail & Related papers (2023-10-13T16:04:06Z) - Scardina: Scalable Join Cardinality Estimation by Multiple Density
Estimators [8.641606056228675]
Machine learning-based cardinality estimation methods are replacing traditional methods.
We propose Scardina, a new join cardinality estimation method using multiple partitioned models based on the schema structure.
arXiv Detail & Related papers (2023-03-31T13:22:28Z) - FactorJoin: A New Cardinality Estimation Framework for Join Queries [35.22928513918166]
Cardinality estimation is one of the most fundamental and challenging problems in query optimization.
We propose a new framework FactorJoin for estimating join queries.
In our evaluation, FactorJoin can produce more effective estimates than the previous state-of-the-art learning-based methods.
arXiv Detail & Related papers (2022-12-11T15:51:39Z) - Knowledge Base Question Answering by Case-based Reasoning over Subgraphs [81.22050011503933]
We show that our model answers queries requiring complex reasoning patterns more effectively than existing KG completion algorithms.
The proposed model outperforms or performs competitively with state-of-the-art models on several KBQA benchmarks.
arXiv Detail & Related papers (2022-02-22T01:34:35Z) - Uniformity in Heterogeneity:Diving Deep into Count Interval Partition
for Crowd Counting [56.44300325295678]
We propose a novel count interval partition criterion called Uniform Error Partition (UEP)
MCP criterion selects the best count proxy for each interval to represent its count value during inference.
We propose a simple yet effective model termed Uniform Error Partition Network (UEPNet)
arXiv Detail & Related papers (2021-07-27T06:24:15Z) - Surprise: Result List Truncation via Extreme Value Theory [92.5817701697342]
We propose a statistical method that produces interpretable and calibrated relevance scores at query time using nothing more than the ranked scores.
We demonstrate its effectiveness on the result list truncation task across image, text, and IR datasets.
arXiv Detail & Related papers (2020-10-19T19:15:50Z) - Monotonic Cardinality Estimation of Similarity Selection: A Deep
Learning Approach [22.958342743597044]
We investigate the possibilities of utilizing deep learning for cardinality estimation of similarity selection.
We propose a novel and generic method that can be applied to any data type and distance function.
arXiv Detail & Related papers (2020-02-15T20:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.