ACE: A Cardinality Estimator for Set-Valued Queries
- URL: http://arxiv.org/abs/2503.14929v1
- Date: Wed, 19 Mar 2025 06:29:15 GMT
- Title: ACE: A Cardinality Estimator for Set-Valued Queries
- Authors: Yufan Sheng, Xin Cao, Kaiqi Zhao, Yixiang Fang, Jianzhong Qi, Wenjie Zhang, Christian S. Jensen,
- Abstract summary: We propose an Attention-based Cardinality Estorimator for estimating the cardinality of queries over set-valued data.<n>To handle variable-sized queries, a pooling module is introduced, followed by a regression model (MLP) to generate final cardinality estimates.
- Score: 35.31790118566289
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cardinality estimation is a fundamental functionality in database systems. Most existing cardinality estimators focus on handling predicates over numeric or categorical data. They have largely omitted an important data type, set-valued data, which frequently occur in contemporary applications such as information retrieval and recommender systems. The few existing estimators for such data either favor high-frequency elements or rely on a partial independence assumption, which limits their practical applicability. We propose ACE, an Attention-based Cardinality Estimator for estimating the cardinality of queries over set-valued data. We first design a distillation-based data encoder to condense the dataset into a compact matrix. We then design an attention-based query analyzer to capture correlations among query elements. To handle variable-sized queries, a pooling module is introduced, followed by a regression model (MLP) to generate final cardinality estimates. We evaluate ACE on three datasets with varying query element distributions, demonstrating that ACE outperforms the state-of-the-art competitors in terms of both accuracy and efficiency.
Related papers
- DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets.
Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining.
Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z) - CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases [17.46316633654637]
Cardinality estimation is crucial for enabling high query performance in databases.
There is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches.
We release a benchmark, containing thousands of queries over 20 distinct real-world databases for learned cardinality estimation.
arXiv Detail & Related papers (2024-08-28T23:25:25Z) - Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders [77.84801537608651]
Cross-encoder (CE) models which compute similarity by jointly encoding a query-item pair perform better than embedding-based models (dual-encoders) at estimating query-item relevance.
We propose a sparse-matrix factorization based method that efficiently computes latent query and item embeddings to approximate CE scores and performs k-NN search with the approximate CE similarity.
arXiv Detail & Related papers (2024-05-06T17:14:34Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Improving Text Matching in E-Commerce Search with A Rationalizable,
Intervenable and Fast Entity-Based Relevance Model [78.80174696043021]
We propose a novel model called the Entity-Based Relevance Model (EBRM)
The decomposition allows us to use a Cross-encoder QE relevance module for high accuracy.
We also show that pretraining the QE module with auto-generated QE data from user logs can further improve the overall performance.
arXiv Detail & Related papers (2023-07-01T15:44:53Z) - Scardina: Scalable Join Cardinality Estimation by Multiple Density
Estimators [8.641606056228675]
Machine learning-based cardinality estimation methods are replacing traditional methods.
We propose Scardina, a new join cardinality estimation method using multiple partitioned models based on the schema structure.
arXiv Detail & Related papers (2023-03-31T13:22:28Z) - Electra: Conditional Generative Model based Predicate-Aware Query
Approximation [10.056919500568013]
ELECTRA is a predicate-aware AQP system that can answer analytics-style queries with a large number of predicates with much smaller approximation errors.
Our evaluations with four different baselines on three real-world datasets show that ELECTRA provides lower AQP error for large number of predicates compared to baselines.
arXiv Detail & Related papers (2022-01-28T21:13:26Z) - Improving Attention Mechanism with Query-Value Interaction [92.67156911466397]
We propose a query-value interaction function which can learn query-aware attention values.
Our approach can consistently improve the performance of many attention-based models.
arXiv Detail & Related papers (2020-10-08T05:12:52Z) - NN-based Transformation of Any SQL Cardinality Estimator for Handling
DISTINCT, AND, OR and NOT [1.8275108630751837]
A query planner requires the set-theoretic cardinality (i.e., without duplicates) for queries with DISTINCT as well as in planning.
Many cardinality estimation methods are limited to estimating cardinalities of only conjunctive queries with duplicates counted.
We describe two methods for handling this deficiency that can be applied to any limited cardinality estimation model.
arXiv Detail & Related papers (2020-04-15T11:20:06Z) - Monotonic Cardinality Estimation of Similarity Selection: A Deep
Learning Approach [22.958342743597044]
We investigate the possibilities of utilizing deep learning for cardinality estimation of similarity selection.
We propose a novel and generic method that can be applied to any data type and distance function.
arXiv Detail & Related papers (2020-02-15T20:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.