Related papers: On Efficient Approximate Queries over Machine Learning Models

On Efficient Approximate Queries over Machine Learning Models

URL: http://arxiv.org/abs/2206.02845v1
Date: Mon, 6 Jun 2022 18:35:19 GMT
Title: On Efficient Approximate Queries over Machine Learning Models
Authors: Dujian Ding, Sihem Amer-Yahia, Laks VS Lakshmanan
Abstract summary: We develop a novel unified framework for approximate query answering by leveraging a proxy to minimize the oracle usage. Our framework uses a judicious combination of invoking the expensive oracle on data samples and applying the cheap proxy on the objects in the DB. Our algorithms outperform the state-of-the-art and achieve high result quality with provable statistical guarantees.
Score: 30.26180913049285
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The question of answering queries over ML predictions has been gaining attention in the database community. This question is challenging because the cost of finding high quality answers corresponds to invoking an oracle such as a human expert or an expensive deep neural network model on every single item in the DB and then applying the query. We develop a novel unified framework for approximate query answering by leveraging a proxy to minimize the oracle usage of finding high quality answers for both Precision-Target (PT) and Recall-Target (RT) queries. Our framework uses a judicious combination of invoking the expensive oracle on data samples and applying the cheap proxy on the objects in the DB. It relies on two assumptions. Under the Proxy Quality assumption, proxy quality can be quantified in a probabilistic manner w.r.t. the oracle. This allows us to develop two algorithms: PQA that efficiently finds high quality answers with high probability and no oracle calls, and PQE, a heuristic extension that achieves empirically good performance with a small number of oracle calls. Alternatively, under the Core Set Closure assumption, we develop two algorithms: CSC that efficiently returns high quality answers with high probability and minimal oracle usage, and CSE, which extends it to more general settings. Our extensive experiments on five real-world datasets on both query types, PT and RT, demonstrate that our algorithms outperform the state-of-the-art and achieve high result quality with provable statistical guarantees.

Related papers

PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving [89.60370366013142]
We propose PlanGEN, a model-agnostic and easily scalable agent framework with three key components: constraint, verification, and selection agents. Specifically, our approach proposes constraint-guided iterative verification to enhance performance of inference-time algorithms.
arXiv Detail & Related papers (2025-02-22T06:21:56Z)
Personalized Top-k Set Queries Over Predicted Scores [21.74740893966611]
This work studies the applicability of expensive external oracles in answering top-k queries over predicted scores. We propose a generic computational framework that handles arbitrary set-based scoring functions.
arXiv Detail & Related papers (2025-02-18T16:19:08Z)
Avoiding $\mathbf{exp(R_{max})}$ scaling in RLHF through Preference-based Exploration [20.76451379043945]
Reinforcement Learning from Human Feedback (RLHF) has emerged as a pivotal technique for large language model (LLM) alignment. This paper studies the setting of online RLHF and focus on improving sample efficiency.
arXiv Detail & Related papers (2025-02-02T04:40:04Z)
Oracle problems as communication tasks and optimization of quantum algorithms [0.0]
A key question is how well an algorithm can succeed with a learning task using only a fixed number of queries. In this work, we propose measuring an algorithm's performance using the mutual information between the output and the actual value.
arXiv Detail & Related papers (2024-09-23T21:03:39Z)
ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
We propose a pioneering generAtive Cross-modal rEtrieval framework (ACE) for end-to-end cross-modal retrieval. ACE achieves state-of-the-art performance in cross-modal retrieval and outperforms the strong baselines on Recall@1 by 15.27% on average.
arXiv Detail & Related papers (2024-06-25T12:47:04Z)
FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering [11.73887020240588]
We propose a novel outline-enhanced generator to achieve clear logic in the generation of multifaceted answers. Then we propose a factuality optimization method based on a carefully designed doubly fine-grained RLHF framework. In particular, when applying our method to Llama2-7B-chat, the derived model FoRAG-L-7B outperforms WebGPT-175B in terms of three commonly used metrics.
arXiv Detail & Related papers (2024-06-19T19:06:36Z)
UniOQA: A Unified Framework for Knowledge Graph Question Answering with Large Language Models [4.627548680442906]
OwnThink is the most extensive Chinese open-domain knowledge graph introduced in recent times. We introduce UniOQA, a unified framework that integrates two parallel approaches to question answering. UniOQA notably advances SpCQL Logical Accuracy to 21.2% and Execution Accuracy to 54.9%, achieving the new state-of-the-art results on this benchmark.
arXiv Detail & Related papers (2024-06-04T08:36:39Z)
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA) We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity. We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z)
MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering [64.6741991162092]
We present MinPrompt, a minimal data augmentation framework for open-domain question answering. We transform the raw text into a graph structure to build connections between different factual sentences. We then apply graph algorithms to identify the minimal set of sentences needed to cover the most information in the raw text. We generate QA pairs based on the identified sentence subset and train the model on the selected sentences to obtain the final model.
arXiv Detail & Related papers (2023-10-08T04:44:36Z)
Adaptive Oracle-Efficient Online Learning [23.185655992407742]
oracle-efficient algorithms search through an exponentially-large space of decisions and select that performed the best on any dataset. We provide a new framework for designing follow-the-perturbed-leader algorithms that are oracle-efficient and adapt well to the small-loss environment. We identify a series of real-world settings, including online auctions and transductive online classification, for which approximability holds.
arXiv Detail & Related papers (2022-10-17T19:32:30Z)
RnG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering [57.94658176442027]
We present RnG-KBQA, a Rank-and-Generate approach for KBQA. We achieve new state-of-the-art results on GrailQA and WebQSP datasets.
arXiv Detail & Related papers (2021-09-17T17:58:28Z)
Approximating Aggregated SQL Queries With LSTM Networks [31.528524004435933]
We present a method for query approximation, also known as approximate query processing (AQP) We use LSTM network to learn the relationship between queries and their results, and to provide a rapid inference layer for predicting query results. Our method was able to predict up to 120,000 queries in a second, and with a single query latency of no more than 2ms.
arXiv Detail & Related papers (2020-10-25T16:17:58Z)
Probabilistic Case-based Reasoning for Open-World Knowledge Graph Completion [59.549664231655726]
A case-based reasoning (CBR) system solves a new problem by retrieving cases' that are similar to the given problem. In this paper, we demonstrate that such a system is achievable for reasoning in knowledge-bases (KBs) Our approach predicts attributes for an entity by gathering reasoning paths from similar entities in the KB.
arXiv Detail & Related papers (2020-10-07T17:48:12Z)
New Oracle-Efficient Algorithms for Private Synthetic Data Release [52.33506193761153]
We present three new algorithms for constructing differentially private synthetic data. The algorithms satisfy differential privacy even in the worst case. Compared to the state-of-the-art method High-Dimensional Matrix Mechanism citeMcKennaMHM18, our algorithms provide better accuracy in the large workload.
arXiv Detail & Related papers (2020-07-10T15:46:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.