DeepSampling: Selectivity Estimation with Predicted Error and Response
Time
- URL: http://arxiv.org/abs/2008.06831v1
- Date: Sun, 16 Aug 2020 03:23:01 GMT
- Title: DeepSampling: Selectivity Estimation with Predicted Error and Response
Time
- Authors: Tin Vu, Ahmed Eldawy
- Abstract summary: This paper proposes DeepSampling, a deep-learning-based model that predicts the accuracy of a sample-based AQP algorithm.
DeepSampling is the first system that provides a reliable tool for existing spatial databases to control the accuracy of AQP.
- Score: 7.23389716633927
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid growth of spatial data urges the research community to find
efficient processing techniques for interactive queries on large volumes of
data. Approximate Query Processing (AQP) is the most prominent technique that
can provide real-time answer for ad-hoc queries based on a random sample.
Unfortunately, existing AQP methods provide an answer without providing any
accuracy metrics due to the complex relationship between the sample size, the
query parameters, the data distribution, and the result accuracy. This paper
proposes DeepSampling, a deep-learning-based model that predicts the accuracy
of a sample-based AQP algorithm, specially selectivity estimation, given the
sample size, the input distribution, and query parameters. The model can also
be reversed to measure the sample size that would produce a desired accuracy.
DeepSampling is the first system that provides a reliable tool for existing
spatial databases to control the accuracy of AQP.
Related papers
- Bayesian Estimation and Tuning-Free Rank Detection for Probability Mass Function Tensors [17.640500920466984]
This paper presents a novel framework for estimating the joint PMF and automatically inferring its rank from observed data.
We derive a deterministic solution based on variational inference (VI) to approximate the posterior distributions of various model parameters. Additionally, we develop a scalable version of the VI-based approach by leveraging variational inference (SVI)
Experiments involving both synthetic data and real movie recommendation data illustrate the advantages of our VI and SVI-based methods in terms of estimation accuracy, automatic rank detection, and computational efficiency.
arXiv Detail & Related papers (2024-10-08T20:07:49Z) - Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a method called Stratified Prediction-Powered Inference (StratPPI)
We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - Query Performance Prediction using Relevance Judgments Generated by Large Language Models [53.97064615557883]
We propose a QPP framework using automatically generated relevance judgments (QPP-GenRE)
QPP-GenRE decomposes QPP into independent subtasks of predicting relevance of each item in a ranked list to a given query.
This allows us to predict any IR evaluation measure using the generated relevance judgments as pseudo-labels.
arXiv Detail & Related papers (2024-04-01T09:33:05Z) - SEAM: Searching Transferable Mixed-Precision Quantization Policy through
Large Margin Regularization [50.04951511146338]
Mixed-precision quantization (MPQ) suffers from the time-consuming process of searching the optimal bit-width allocation for each layer.
This paper proposes a novel method for efficiently searching for effective MPQ policies using a small proxy dataset.
arXiv Detail & Related papers (2023-02-14T05:47:45Z) - Approximate Gibbs Sampler for Efficient Inference of Hierarchical Bayesian Models for Grouped Count Data [0.0]
This research develops an approximate Gibbs sampler (AGS) to efficiently learn the HBPRMs while maintaining the inference accuracy.
Numerical experiments using real and synthetic datasets with small and large counts demonstrate the superior performance of AGS.
arXiv Detail & Related papers (2022-11-28T21:00:55Z) - Approximate Query Processing for Group-By Queries based on Conditional
Generative Models [3.9837198605506963]
Group-by query involves multiple values, which makes it difficult to provide sufficiently accurate estimations for all the groups.
Stratified sampling improves the accuracy compared with the uniform sampling, but the samples chosen for some special queries cannot work for other queries.
Online sampling chooses samples for the given query at query time, but it requires a long latency.
The proposed framework can be combined with stratified sampling and online aggregation to improve the estimation accuracy for group-by queries.
arXiv Detail & Related papers (2021-01-08T08:49:21Z) - APQ: Joint Search for Network Architecture, Pruning and Quantization
Policy [49.3037538647714]
We present APQ for efficient deep learning inference on resource-constrained hardware.
Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner.
With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ.
arXiv Detail & Related papers (2020-06-15T16:09:17Z) - PyODDS: An End-to-end Outlier Detection System with Automated Machine
Learning [55.32009000204512]
We present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support.
Specifically, we define the search space in the outlier detection pipeline, and produce a search strategy within the given search space.
It also provides unified interfaces and visualizations for users with or without data science or machine learning background.
arXiv Detail & Related papers (2020-03-12T03:30:30Z) - LAQP: Learning-based Approximate Query Processing [5.249017312277057]
Approximate query processing (AQP) is a way to meet the requirement of fast response.
We propose a learning-based AQP method called the LAQP.
It builds an error model learned from the historical queries to predict the sampling-based estimation error of each new query.
arXiv Detail & Related papers (2020-03-05T06:08:25Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.