Related papers: DeepSampling: Selectivity Estimation with Predicted Error and Response Time

DeepSampling: Selectivity Estimation with Predicted Error and Response Time

URL: http://arxiv.org/abs/2008.06831v1
Date: Sun, 16 Aug 2020 03:23:01 GMT
Title: DeepSampling: Selectivity Estimation with Predicted Error and Response Time
Authors: Tin Vu, Ahmed Eldawy
Abstract summary: This paper proposes DeepSampling, a deep-learning-based model that predicts the accuracy of a sample-based AQP algorithm. DeepSampling is the first system that provides a reliable tool for existing spatial databases to control the accuracy of AQP.
Score: 7.23389716633927
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid growth of spatial data urges the research community to find efficient processing techniques for interactive queries on large volumes of data. Approximate Query Processing (AQP) is the most prominent technique that can provide real-time answer for ad-hoc queries based on a random sample. Unfortunately, existing AQP methods provide an answer without providing any accuracy metrics due to the complex relationship between the sample size, the query parameters, the data distribution, and the result accuracy. This paper proposes DeepSampling, a deep-learning-based model that predicts the accuracy of a sample-based AQP algorithm, specially selectivity estimation, given the sample size, the input distribution, and query parameters. The model can also be reversed to measure the sample size that would produce a desired accuracy. DeepSampling is the first system that provides a reliable tool for existing spatial databases to control the accuracy of AQP.

Related papers

Optimal Algorithms for Augmented Testing of Discrete Distributions [25.818433126197036]
We show that a predictor can indeed reduce the number of samples required for all three property testing tasks. A key advantage of our algorithms is their adaptability to the precision of the prediction. We provide lower bounds to indicate that the improvements in sample complexity achieved by our algorithms are information-theoretically optimal.
arXiv Detail & Related papers (2024-12-01T21:31:22Z)
Bayesian Estimation and Tuning-Free Rank Detection for Probability Mass Function Tensors [17.640500920466984]
This paper presents a novel framework for estimating the joint PMF and automatically inferring its rank from observed data. We derive a deterministic solution based on variational inference (VI) to approximate the posterior distributions of various model parameters. Additionally, we develop a scalable version of the VI-based approach by leveraging variational inference (SVI) Experiments involving both synthetic data and real movie recommendation data illustrate the advantages of our VI and SVI-based methods in terms of estimation accuracy, automatic rank detection, and computational efficiency.
arXiv Detail & Related papers (2024-10-08T20:07:49Z)
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. We propose a method called Stratified Prediction-Powered Inference (StratPPI) We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z)
Query Performance Prediction using Relevance Judgments Generated by Large Language Models [53.97064615557883]
We propose a QPP framework using automatically generated relevance judgments (QPP-GenRE) QPP-GenRE decomposes QPP into independent subtasks of predicting relevance of each item in a ranked list to a given query. This allows us to predict any IR evaluation measure using the generated relevance judgments as pseudo-labels.
arXiv Detail & Related papers (2024-04-01T09:33:05Z)
One-Dimensional Deep Image Prior for Curve Fitting of S-Parameters from Electromagnetic Solvers [57.441926088870325]
Deep Image Prior (DIP) is a technique that optimized the weights of a randomly-d convolutional neural network to fit a signal from noisy or under-determined measurements. Relative to publicly available implementations of Vector Fitting (VF), our method shows superior performance on nearly all test examples.
arXiv Detail & Related papers (2023-06-06T20:28:37Z)
SEAM: Searching Transferable Mixed-Precision Quantization Policy through Large Margin Regularization [50.04951511146338]
Mixed-precision quantization (MPQ) suffers from the time-consuming process of searching the optimal bit-width allocation for each layer. This paper proposes a novel method for efficiently searching for effective MPQ policies using a small proxy dataset.
arXiv Detail & Related papers (2023-02-14T05:47:45Z)
Approximate Gibbs Sampler for Efficient Inference of Hierarchical Bayesian Models for Grouped Count Data [0.0]
This research develops an approximate Gibbs sampler (AGS) to efficiently learn the HBPRMs while maintaining the inference accuracy. Numerical experiments using real and synthetic datasets with small and large counts demonstrate the superior performance of AGS.
arXiv Detail & Related papers (2022-11-28T21:00:55Z)
Approximate Query Processing for Group-By Queries based on Conditional Generative Models [3.9837198605506963]
Group-by query involves multiple values, which makes it difficult to provide sufficiently accurate estimations for all the groups. Stratified sampling improves the accuracy compared with the uniform sampling, but the samples chosen for some special queries cannot work for other queries. Online sampling chooses samples for the given query at query time, but it requires a long latency. The proposed framework can be combined with stratified sampling and online aggregation to improve the estimation accuracy for group-by queries.
arXiv Detail & Related papers (2021-01-08T08:49:21Z)
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy [49.3037538647714]
We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ.
arXiv Detail & Related papers (2020-06-15T16:09:17Z)
PyODDS: An End-to-end Outlier Detection System with Automated Machine Learning [55.32009000204512]
We present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support. Specifically, we define the search space in the outlier detection pipeline, and produce a search strategy within the given search space. It also provides unified interfaces and visualizations for users with or without data science or machine learning background.
arXiv Detail & Related papers (2020-03-12T03:30:30Z)
LAQP: Learning-based Approximate Query Processing [5.249017312277057]
Approximate query processing (AQP) is a way to meet the requirement of fast response. We propose a learning-based AQP method called the LAQP. It builds an error model learned from the historical queries to predict the sampling-based estimation error of each new query.
arXiv Detail & Related papers (2020-03-05T06:08:25Z)
Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass. We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.