Electra: Conditional Generative Model based Predicate-Aware Query
Approximation
- URL: http://arxiv.org/abs/2201.12420v1
- Date: Fri, 28 Jan 2022 21:13:26 GMT
- Title: Electra: Conditional Generative Model based Predicate-Aware Query
Approximation
- Authors: Nikhil Sheoran, Subrata Mitra, Vibhor Porwal, Siddharth Ghetia, Jatin
Varshney, Tung Mai, Anup Rao, Vikas Maddukuri
- Abstract summary: ELECTRA is a predicate-aware AQP system that can answer analytics-style queries with a large number of predicates with much smaller approximation errors.
Our evaluations with four different baselines on three real-world datasets show that ELECTRA provides lower AQP error for large number of predicates compared to baselines.
- Score: 10.056919500568013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of Approximate Query Processing (AQP) is to provide very fast but
"accurate enough" results for costly aggregate queries thereby improving user
experience in interactive exploration of large datasets. Recently proposed
Machine-Learning based AQP techniques can provide very low latency as query
execution only involves model inference as compared to traditional query
processing on database clusters. However, with increase in the number of
filtering predicates(WHERE clauses), the approximation error significantly
increases for these methods. Analysts often use queries with a large number of
predicates for insights discovery. Thus, maintaining low approximation error is
important to prevent analysts from drawing misleading conclusions. In this
paper, we propose ELECTRA, a predicate-aware AQP system that can answer
analytics-style queries with a large number of predicates with much smaller
approximation errors. ELECTRA uses a conditional generative model that learns
the conditional distribution of the data and at runtime generates a small
(~1000 rows) but representative sample, on which the query is executed to
compute the approximate result. Our evaluations with four different baselines
on three real-world datasets show that ELECTRA provides lower AQP error for
large number of predicates compared to baselines.
Related papers
- Data Fusion of Synthetic Query Variants With Generative Large Language Models [1.864807003137943]
This work explores the feasibility of using synthetic query variants generated by instruction-tuned Large Language Models in data fusion experiments.
We introduce a lightweight, unsupervised, and cost-efficient approach that exploits principled prompting and data fusion techniques.
Our analysis shows that data fusion based on synthetic query variants is significantly better than baselines with single queries and also outperforms pseudo-relevance feedback methods.
arXiv Detail & Related papers (2024-11-06T12:54:27Z) - UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics.
We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - MeaeQ: Mount Model Extraction Attacks with Efficient Queries [6.1106195466129485]
We study model extraction attacks in natural language processing (NLP)
We propose MeaeQ, a straightforward yet effective method to address these issues.
MeaeQ achieves higher functional similarity to the victim model than baselines while requiring fewer queries.
arXiv Detail & Related papers (2023-10-21T16:07:16Z) - MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering [64.6741991162092]
We present MinPrompt, a minimal data augmentation framework for open-domain question answering.
We transform the raw text into a graph structure to build connections between different factual sentences.
We then apply graph algorithms to identify the minimal set of sentences needed to cover the most information in the raw text.
We generate QA pairs based on the identified sentence subset and train the model on the selected sentences to obtain the final model.
arXiv Detail & Related papers (2023-10-08T04:44:36Z) - Improving Text Matching in E-Commerce Search with A Rationalizable,
Intervenable and Fast Entity-Based Relevance Model [78.80174696043021]
We propose a novel model called the Entity-Based Relevance Model (EBRM)
The decomposition allows us to use a Cross-encoder QE relevance module for high accuracy.
We also show that pretraining the QE module with auto-generated QE data from user logs can further improve the overall performance.
arXiv Detail & Related papers (2023-07-01T15:44:53Z) - Kepler: Robust Learning for Faster Parametric Query Optimization [5.6119420695093245]
We propose an end-to-end learning-based approach to parametric query optimization.
Kepler achieves significant improvements in query runtime on multiple datasets.
arXiv Detail & Related papers (2023-06-11T22:39:28Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - Approximate Query Processing for Group-By Queries based on Conditional
Generative Models [3.9837198605506963]
Group-by query involves multiple values, which makes it difficult to provide sufficiently accurate estimations for all the groups.
Stratified sampling improves the accuracy compared with the uniform sampling, but the samples chosen for some special queries cannot work for other queries.
Online sampling chooses samples for the given query at query time, but it requires a long latency.
The proposed framework can be combined with stratified sampling and online aggregation to improve the estimation accuracy for group-by queries.
arXiv Detail & Related papers (2021-01-08T08:49:21Z) - Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach.
IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.