Approximating a Target Distribution using Weight Queries
- URL: http://arxiv.org/abs/2006.13636v2
- Date: Sat, 27 Feb 2021 17:56:58 GMT
- Title: Approximating a Target Distribution using Weight Queries
- Authors: Nadav Barak and Sivan Sabato
- Abstract summary: We propose an interactive algorithm that iteratively selects data set examples and performs corresponding weight queries.
We derive an approximation bound on the total variation distance between the reweighting found by the algorithm and the best achievable reweighting.
- Score: 25.392248158616862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a novel challenge: approximating a distribution without the
ability to randomly sample from that distribution. We study how such an
approximation can be obtained using *weight queries*. Given some data set of
examples, a weight query presents one of the examples to an oracle, which
returns the probability, according to the target distribution, of observing
examples similar to the presented example. This oracle can represent, for
instance, counting queries to a database of the target population, or an
interface to a search engine which returns the number of results that match a
given search. We propose an interactive algorithm that iteratively selects data
set examples and performs corresponding weight queries. The algorithm finds a
reweighting of the data set that approximates the weights according to the
target distribution, using a limited number of weight queries. We derive an
approximation bound on the total variation distance between the reweighting
found by the algorithm and the best achievable reweighting. Our algorithm takes
inspiration from the UCB approach common in multi-armed bandits problems, and
combines it with a new discrepancy estimator and a greedy iterative procedure.
In addition to our theoretical guarantees, we demonstrate in experiments the
advantages of the proposed algorithm over several baselines. A python
implementation of the proposed algorithm and of all the experiments can be
found at https://github.com/Nadav-Barak/AWP.
Related papers
- pEBR: A Probabilistic Approach to Embedding Based Retrieval [4.8338111302871525]
Embedding retrieval aims to learn a shared semantic representation space for both queries and items.
In current industrial practice, retrieval systems typically retrieve a fixed number of items for different queries.
arXiv Detail & Related papers (2024-10-25T07:14:12Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Approximating a RUM from Distributions on k-Slates [88.32814292632675]
We find a generalization-time algorithm that finds the RUM that best approximates the given distribution on average.
Our theoretical result can also be made practical: we obtain a that is effective and scales to real-world datasets.
arXiv Detail & Related papers (2023-05-22T17:43:34Z) - Finding Support Examples for In-Context Learning [73.90376920653507]
We propose LENS, a fiLter-thEN-Search method to tackle this challenge in two stages.
First we filter the dataset to obtain informative in-context examples individually.
Then we propose diversity-guided example search which iteratively refines and evaluates the selected example permutations.
arXiv Detail & Related papers (2023-02-27T06:32:45Z) - Leveraging Importance Weights in Subset Selection [45.54597544672441]
We present a subset selection algorithm designed to work with arbitrary model families in a practical batch setting.
Our algorithm, IWeS, selects examples by importance sampling where the sampling probability assigned to each example is based on the entropy of models trained on previously selected batches.
arXiv Detail & Related papers (2023-01-28T02:07:31Z) - A Bayesian Bradley-Terry model to compare multiple ML algorithms on
multiple data sets [4.394728504061753]
This paper proposes a Bayesian model to compare multiple algorithms on multiple data sets, on any metric.
The model is based on the Bradley-Terry model, that counts the number of times one algorithm performs better than another on different data sets.
arXiv Detail & Related papers (2022-08-09T17:59:06Z) - A Metaheuristic Algorithm for Large Maximum Weight Independent Set
Problems [58.348679046591265]
Given a node-weighted graph, find a set of independent (mutually nonadjacent) nodes whose node-weight sum is maximum.
Some of the graphs airsing in this application are large, having hundreds of thousands of nodes and hundreds of millions of edges.
We develop a new local search algorithm, which is a metaheuristic in the greedy randomized adaptive search framework.
arXiv Detail & Related papers (2022-03-28T21:34:16Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Approximate Query Processing for Group-By Queries based on Conditional
Generative Models [3.9837198605506963]
Group-by query involves multiple values, which makes it difficult to provide sufficiently accurate estimations for all the groups.
Stratified sampling improves the accuracy compared with the uniform sampling, but the samples chosen for some special queries cannot work for other queries.
Online sampling chooses samples for the given query at query time, but it requires a long latency.
The proposed framework can be combined with stratified sampling and online aggregation to improve the estimation accuracy for group-by queries.
arXiv Detail & Related papers (2021-01-08T08:49:21Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - Optimal Representative Sample Weighting [0.0]
We consider the problem of assigning weights to a set of samples or data records, with the goal of achieving a representative weighting.
We frame the problem of finding representative sample weights as an optimization problem, which in many cases is convex and can be efficiently solved.
We describe rsw, an open-source implementation of the ideas described in this paper, and apply it to a skewed sample of the CDC BRFSS dataset.
arXiv Detail & Related papers (2020-05-18T20:29:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.