Monotonic Cardinality Estimation of Similarity Selection: A Deep
Learning Approach
- URL: http://arxiv.org/abs/2002.06442v4
- Date: Fri, 24 Sep 2021 16:51:44 GMT
- Title: Monotonic Cardinality Estimation of Similarity Selection: A Deep
Learning Approach
- Authors: Yaoshu Wang, Chuan Xiao, Jianbin Qin, Xin Cao, Yifang Sun, Wei Wang,
and Makoto Onizuka
- Abstract summary: We investigate the possibilities of utilizing deep learning for cardinality estimation of similarity selection.
We propose a novel and generic method that can be applied to any data type and distance function.
- Score: 22.958342743597044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the outstanding capability of capturing underlying data distributions,
deep learning techniques have been recently utilized for a series of
traditional database problems. In this paper, we investigate the possibilities
of utilizing deep learning for cardinality estimation of similarity selection.
Answering this problem accurately and efficiently is essential to many data
management applications, especially for query optimization. Moreover, in some
applications the estimated cardinality is supposed to be consistent and
interpretable. Hence a monotonic estimation w.r.t. the query threshold is
preferred. We propose a novel and generic method that can be applied to any
data type and distance function. Our method consists of a feature extraction
model and a regression model. The feature extraction model transforms original
data and threshold to a Hamming space, in which a deep learning-based
regression model is utilized to exploit the incremental property of cardinality
w.r.t. the threshold for both accuracy and monotonicity. We develop a training
strategy tailored to our model as well as techniques for fast estimation. We
also discuss how to handle updates. We demonstrate the accuracy and the
efficiency of our method through experiments, and show how it improves the
performance of a query optimizer.
Related papers
- CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases [17.46316633654637]
Cardinality estimation is crucial for enabling high query performance in databases.
There is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches.
We release a benchmark, containing thousands of queries over 20 distinct real-world databases for learned cardinality estimation.
arXiv Detail & Related papers (2024-08-28T23:25:25Z) - Efficient and Generalizable Certified Unlearning: A Hessian-free Recollection Approach [8.875278412741695]
Machine unlearning strives to uphold the data owners' right to be forgotten by enabling models to selectively forget specific data.
We develop an algorithm that achieves near-instantaneous unlearning as it only requires a vector addition operation.
arXiv Detail & Related papers (2024-04-02T07:54:18Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Invariance Learning in Deep Neural Networks with Differentiable Laplace
Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation.
We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z) - Practical Active Learning with Model Selection for Small Data [13.128648437690224]
We develop a simple and fast method for practical active learning with model selection.
Our method is based on an underlying pool-based active learner for binary classification using support vector classification with a radial basis function kernel.
arXiv Detail & Related papers (2021-12-21T23:11:27Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Model-agnostic and Scalable Counterfactual Explanations via
Reinforcement Learning [0.5729426778193398]
We propose a deep reinforcement learning approach that transforms the optimization procedure into an end-to-end learnable process.
Our experiments on real-world data show that our method is model-agnostic, relying only on feedback from model predictions.
arXiv Detail & Related papers (2021-06-04T16:54:36Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Probabilistic Case-based Reasoning for Open-World Knowledge Graph
Completion [59.549664231655726]
A case-based reasoning (CBR) system solves a new problem by retrieving cases' that are similar to the given problem.
In this paper, we demonstrate that such a system is achievable for reasoning in knowledge-bases (KBs)
Our approach predicts attributes for an entity by gathering reasoning paths from similar entities in the KB.
arXiv Detail & Related papers (2020-10-07T17:48:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.