On Efficient Approximate Queries over Machine Learning Models
- URL: http://arxiv.org/abs/2206.02845v1
- Date: Mon, 6 Jun 2022 18:35:19 GMT
- Title: On Efficient Approximate Queries over Machine Learning Models
- Authors: Dujian Ding, Sihem Amer-Yahia, Laks VS Lakshmanan
- Abstract summary: We develop a novel unified framework for approximate query answering by leveraging a proxy to minimize the oracle usage.
Our framework uses a judicious combination of invoking the expensive oracle on data samples and applying the cheap proxy on the objects in the DB.
Our algorithms outperform the state-of-the-art and achieve high result quality with provable statistical guarantees.
- Score: 30.26180913049285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The question of answering queries over ML predictions has been gaining
attention in the database community. This question is challenging because the
cost of finding high quality answers corresponds to invoking an oracle such as
a human expert or an expensive deep neural network model on every single item
in the DB and then applying the query. We develop a novel unified framework for
approximate query answering by leveraging a proxy to minimize the oracle usage
of finding high quality answers for both Precision-Target (PT) and
Recall-Target (RT) queries. Our framework uses a judicious combination of
invoking the expensive oracle on data samples and applying the cheap proxy on
the objects in the DB. It relies on two assumptions. Under the Proxy Quality
assumption, proxy quality can be quantified in a probabilistic manner w.r.t.
the oracle. This allows us to develop two algorithms: PQA that efficiently
finds high quality answers with high probability and no oracle calls, and PQE,
a heuristic extension that achieves empirically good performance with a small
number of oracle calls. Alternatively, under the Core Set Closure assumption,
we develop two algorithms: CSC that efficiently returns high quality answers
with high probability and minimal oracle usage, and CSE, which extends it to
more general settings. Our extensive experiments on five real-world datasets on
both query types, PT and RT, demonstrate that our algorithms outperform the
state-of-the-art and achieve high result quality with provable statistical
guarantees.
Related papers
- Oracle problems as communication tasks and optimization of quantum algorithms [0.0]
A key question is how well an algorithm can succeed with a learning task using only a fixed number of queries.
In this work, we propose measuring an algorithm's performance using the mutual information between the output and the actual value.
arXiv Detail & Related papers (2024-09-23T21:03:39Z) - FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering [11.73887020240588]
We propose a novel outline-enhanced generator to achieve clear logic in the generation of multifaceted answers.
Then we propose a factuality optimization method based on a carefully designed doubly fine-grained RLHF framework.
In particular, when applying our method to Llama2-7B-chat, the derived model FoRAG-L-7B outperforms WebGPT-175B in terms of three commonly used metrics.
arXiv Detail & Related papers (2024-06-19T19:06:36Z) - UniOQA: A Unified Framework for Knowledge Graph Question Answering with Large Language Models [4.627548680442906]
OwnThink is the most extensive Chinese open-domain knowledge graph introduced in recent times.
We introduce UniOQA, a unified framework that integrates two parallel approaches to question answering.
UniOQA notably advances SpCQL Logical Accuracy to 21.2% and Execution Accuracy to 54.9%, achieving the new state-of-the-art results on this benchmark.
arXiv Detail & Related papers (2024-06-04T08:36:39Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering [64.6741991162092]
We present MinPrompt, a minimal data augmentation framework for open-domain question answering.
We transform the raw text into a graph structure to build connections between different factual sentences.
We then apply graph algorithms to identify the minimal set of sentences needed to cover the most information in the raw text.
We generate QA pairs based on the identified sentence subset and train the model on the selected sentences to obtain the final model.
arXiv Detail & Related papers (2023-10-08T04:44:36Z) - Adaptive Oracle-Efficient Online Learning [23.185655992407742]
oracle-efficient algorithms search through an exponentially-large space of decisions and select that performed the best on any dataset.
We provide a new framework for designing follow-the-perturbed-leader algorithms that are oracle-efficient and adapt well to the small-loss environment.
We identify a series of real-world settings, including online auctions and transductive online classification, for which approximability holds.
arXiv Detail & Related papers (2022-10-17T19:32:30Z) - RnG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base
Question Answering [57.94658176442027]
We present RnG-KBQA, a Rank-and-Generate approach for KBQA.
We achieve new state-of-the-art results on GrailQA and WebQSP datasets.
arXiv Detail & Related papers (2021-09-17T17:58:28Z) - CREPO: An Open Repository to Benchmark Credal Network Algorithms [78.79752265884109]
Credal networks are imprecise probabilistic graphical models based on, so-called credal, sets of probability mass functions.
A Java library called CREMA has been recently released to model, process and query credal networks.
We present CREPO, an open repository of synthetic credal networks, provided together with the exact results of inference tasks on these models.
arXiv Detail & Related papers (2021-05-10T07:31:59Z) - Approximating Aggregated SQL Queries With LSTM Networks [31.528524004435933]
We present a method for query approximation, also known as approximate query processing (AQP)
We use LSTM network to learn the relationship between queries and their results, and to provide a rapid inference layer for predicting query results.
Our method was able to predict up to 120,000 queries in a second, and with a single query latency of no more than 2ms.
arXiv Detail & Related papers (2020-10-25T16:17:58Z) - Probabilistic Case-based Reasoning for Open-World Knowledge Graph
Completion [59.549664231655726]
A case-based reasoning (CBR) system solves a new problem by retrieving cases' that are similar to the given problem.
In this paper, we demonstrate that such a system is achievable for reasoning in knowledge-bases (KBs)
Our approach predicts attributes for an entity by gathering reasoning paths from similar entities in the KB.
arXiv Detail & Related papers (2020-10-07T17:48:12Z) - New Oracle-Efficient Algorithms for Private Synthetic Data Release [52.33506193761153]
We present three new algorithms for constructing differentially private synthetic data.
The algorithms satisfy differential privacy even in the worst case.
Compared to the state-of-the-art method High-Dimensional Matrix Mechanism citeMcKennaMHM18, our algorithms provide better accuracy in the large workload.
arXiv Detail & Related papers (2020-07-10T15:46:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.