Related papers: NN-based Transformation of Any SQL Cardinality Estimator for Handling DISTINCT, AND, OR and NOT

NN-based Transformation of Any SQL Cardinality Estimator for Handling DISTINCT, AND, OR and NOT

URL: http://arxiv.org/abs/2004.07009v1
Date: Wed, 15 Apr 2020 11:20:06 GMT
Title: NN-based Transformation of Any SQL Cardinality Estimator for Handling DISTINCT, AND, OR and NOT
Authors: Rojeh Hayek, Oded Shmueli
Abstract summary: A query planner requires the set-theoretic cardinality (i.e., without duplicates) for queries with DISTINCT as well as in planning. Many cardinality estimation methods are limited to estimating cardinalities of only conjunctive queries with duplicates counted. We describe two methods for handling this deficiency that can be applied to any limited cardinality estimation model.
Score: 1.8275108630751837
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: SQL queries, with the AND, OR, and NOT operators, constitute a broad class of highly used queries. Thus, their cardinality estimation is important for query optimization. In addition, a query planner requires the set-theoretic cardinality (i.e., without duplicates) for queries with DISTINCT as well as in planning; for example, when considering sorting options. Yet, despite the importance of estimating query cardinalities in the presence of DISTINCT, AND, OR, and NOT, many cardinality estimation methods are limited to estimating cardinalities of only conjunctive queries with duplicates counted. The focus of this work is on two methods for handling this deficiency that can be applied to any limited cardinality estimation model. First, we describe a specialized deep learning scheme, PUNQ, which is tailored to representing conjunctive SQL queries and predicting the percentage of unique rows in the query's result with duplicate rows. Using the predicted percentages obtained via PUNQ, we are able to transform any cardinality estimation method that only estimates for conjunctive queries, and which estimates cardinalities with duplicates (e.g., MSCN), to a method that estimates queries cardinalities without duplicates. This enables estimating cardinalities of queries with the DISTINCT keyword. In addition, we describe a recursive algorithm, GenCrd, for extending any cardinality estimation method M that only handles conjunctive queries to one that estimates cardinalities for more general queries (that include AND, OR, and NOT), without changing the method M itself. Our evaluation is carried out on a challenging, real-world database with general queries that include either the DISTINCT keyword or the AND, OR, and NOT operators. Experimentally, we show that the proposed methods obtain accurate cardinality estimates with the same level of accuracy as that of the original transformed methods.

Related papers

ACE: A Cardinality Estimator for Set-Valued Queries [35.31790118566289]
We propose an Attention-based Cardinality Estorimator for estimating the cardinality of queries over set-valued data. To handle variable-sized queries, a pooling module is introduced, followed by a regression model (MLP) to generate final cardinality estimates.
arXiv Detail & Related papers (2025-03-19T06:29:15Z)
pEBR: A Probabilistic Approach to Embedding Based Retrieval [4.8338111302871525]
Embedding retrieval aims to learn a shared semantic representation space for both queries and items. In current industrial practice, retrieval systems typically retrieve a fixed number of items for different queries.
arXiv Detail & Related papers (2024-10-25T07:14:12Z)
CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases [17.46316633654637]
Cardinality estimation is crucial for enabling high query performance in databases. There is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches. We release a benchmark, containing thousands of queries over 20 distinct real-world databases for learned cardinality estimation.
arXiv Detail & Related papers (2024-08-28T23:25:25Z)
Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders [77.84801537608651]
Cross-encoder (CE) models which compute similarity by jointly encoding a query-item pair perform better than embedding-based models (dual-encoders) at estimating query-item relevance. We propose a sparse-matrix factorization based method that efficiently computes latent query and item embeddings to approximate CE scores and performs k-NN search with the approximate CE similarity.
arXiv Detail & Related papers (2024-05-06T17:14:34Z)
Query Performance Prediction using Relevance Judgments Generated by Large Language Models [53.97064615557883]
We propose a QPP framework using automatically generated relevance judgments (QPP-GenRE) QPP-GenRE decomposes QPP into independent subtasks of predicting relevance of each item in a ranked list to a given query. This allows us to predict any IR evaluation measure using the generated relevance judgments as pseudo-labels.
arXiv Detail & Related papers (2024-04-01T09:33:05Z)
Regularization-Based Methods for Ordinal Quantification [49.606912965922504]
We study the ordinal case, i.e., the case in which a total order is defined on the set of n>2 classes. We propose a novel class of regularized OQ algorithms, which outperforms existing algorithms in our experiments.
arXiv Detail & Related papers (2023-10-13T16:04:06Z)
Scardina: Scalable Join Cardinality Estimation by Multiple Density Estimators [8.641606056228675]
Machine learning-based cardinality estimation methods are replacing traditional methods. We propose Scardina, a new join cardinality estimation method using multiple partitioned models based on the schema structure.
arXiv Detail & Related papers (2023-03-31T13:22:28Z)
FactorJoin: A New Cardinality Estimation Framework for Join Queries [35.22928513918166]
Cardinality estimation is one of the most fundamental and challenging problems in query optimization. We propose a new framework FactorJoin for estimating join queries. In our evaluation, FactorJoin can produce more effective estimates than the previous state-of-the-art learning-based methods.
arXiv Detail & Related papers (2022-12-11T15:51:39Z)
Knowledge Base Question Answering by Case-based Reasoning over Subgraphs [81.22050011503933]
We show that our model answers queries requiring complex reasoning patterns more effectively than existing KG completion algorithms. The proposed model outperforms or performs competitively with state-of-the-art models on several KBQA benchmarks.
arXiv Detail & Related papers (2022-02-22T01:34:35Z)
Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting [56.44300325295678]
We propose a novel count interval partition criterion called Uniform Error Partition (UEP) MCP criterion selects the best count proxy for each interval to represent its count value during inference. We propose a simple yet effective model termed Uniform Error Partition Network (UEPNet)
arXiv Detail & Related papers (2021-07-27T06:24:15Z)
Surprise: Result List Truncation via Extreme Value Theory [92.5817701697342]
We propose a statistical method that produces interpretable and calibrated relevance scores at query time using nothing more than the ranked scores. We demonstrate its effectiveness on the result list truncation task across image, text, and IR datasets.
arXiv Detail & Related papers (2020-10-19T19:15:50Z)
Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach [22.958342743597044]
We investigate the possibilities of utilizing deep learning for cardinality estimation of similarity selection. We propose a novel and generic method that can be applied to any data type and distance function.
arXiv Detail & Related papers (2020-02-15T20:22:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.