Non-Stochastic CDF Estimation Using Threshold Queries
- URL: http://arxiv.org/abs/2301.05682v1
- Date: Fri, 13 Jan 2023 18:00:57 GMT
- Title: Non-Stochastic CDF Estimation Using Threshold Queries
- Authors: Princewill Okoroafor, Vaishnavi Gupta, Robert Kleinberg, Eleanor Goh
- Abstract summary: We tackle the problem of estimating an empirical distribution in a setting with two challenges.
First, the algorithm does not directly observe the data; instead, it only asks a limited number of threshold queries about each sample.
Second, the data are not assumed to be independent and identically distributed; instead, we allow for an arbitrary process generating the samples.
- Score: 3.6576781735746513
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating the empirical distribution of a scalar-valued data set is a basic
and fundamental task. In this paper, we tackle the problem of estimating an
empirical distribution in a setting with two challenging features. First, the
algorithm does not directly observe the data; instead, it only asks a limited
number of threshold queries about each sample. Second, the data are not assumed
to be independent and identically distributed; instead, we allow for an
arbitrary process generating the samples, including an adaptive adversary.
These considerations are relevant, for example, when modeling a seller
experimenting with posted prices to estimate the distribution of consumers'
willingness to pay for a product: offering a price and observing a consumer's
purchase decision is equivalent to asking a single threshold query about their
value, and the distribution of consumers' values may be non-stationary over
time, as early adopters may differ markedly from late adopters.
Our main result quantifies, to within a constant factor, the sample
complexity of estimating the empirical CDF of a sequence of elements of $[n]$,
up to $\varepsilon$ additive error, using one threshold query per sample. The
complexity depends only logarithmically on $n$, and our result can be
interpreted as extending the existing logarithmic-complexity results for noisy
binary search to the more challenging setting where noise is non-stochastic.
Along the way to designing our algorithm, we consider a more general model in
which the algorithm is allowed to make a limited number of simultaneous
threshold queries on each sample. We solve this problem using Blackwell's
Approachability Theorem and the exponential weights method. As a side result of
independent interest, we characterize the minimum number of simultaneous
threshold queries required by deterministic CDF estimation algorithms.
Related papers
- Efficiently learning and sampling multimodal distributions with data-based initialization [20.575122468674536]
We consider the problem of sampling a multimodal distribution with a Markov chain given a small number of samples from the stationary measure.
We show that if the Markov chain has a $k$th order spectral gap, samples from the stationary distribution will efficiently generate a sample whose conditional law is $varepsilon$-close in TV distance to the stationary measure.
arXiv Detail & Related papers (2024-11-14T01:37:02Z) - Scalable Decentralized Algorithms for Online Personalized Mean Estimation [12.002609934938224]
This study focuses on a simplified version of the overarching problem, where each agent collects samples from a real-valued distribution over time to estimate its mean.
We introduce two collaborative mean estimation algorithms: one draws inspiration from belief propagation, while the other employs a consensus-based approach.
arXiv Detail & Related papers (2024-02-20T08:30:46Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Factorization of Multi-Agent Sampling-Based Motion Planning [72.42734061131569]
Modern robotics often involves multiple embodied agents operating within a shared environment.
Standard sampling-based algorithms can be used to search for solutions in the robots' joint space.
We integrate the concept of factorization into sampling-based algorithms, which requires only minimal modifications to existing methods.
We present a general implementation of a factorized SBA, derive an analytical gain in terms of sample complexity for PRM*, and showcase empirical results for RRG.
arXiv Detail & Related papers (2023-04-01T15:50:18Z) - On-Demand Sampling: Learning Optimally from Multiple Distributions [63.20009081099896]
Social and real-world considerations have given rise to multi-distribution learning paradigms.
We establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity.
Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving zero-sum games.
arXiv Detail & Related papers (2022-10-22T19:07:26Z) - Active Sampling of Multiple Sources for Sequential Estimation [92.37271004438406]
The objective is to design an active sampling algorithm for sequentially estimating parameters in order to form reliable estimates.
This paper adopts emph conditional estimation cost functions, leading to a sequential estimation approach that was recently shown to render tractable analysis.
arXiv Detail & Related papers (2022-08-10T15:58:05Z) - Convergence for score-based generative modeling with polynomial
complexity [9.953088581242845]
We prove the first convergence guarantees for the core mechanic behind Score-based generative modeling.
Compared to previous works, we do not incur error that grows exponentially in time or that suffers from a curse of dimensionality.
We show that a predictor-corrector gives better convergence than using either portion alone.
arXiv Detail & Related papers (2022-06-13T14:57:35Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z) - Learning Entangled Single-Sample Distributions via Iterative Trimming [28.839136703139225]
We analyze a simple and computationally efficient method based on iteratively trimming samples and re-estimating the parameter on the trimmed sample set.
We show that the method in logarithmic iterations outputs an estimation whose error only depends on the noise level of the $lceil alpha n rceil$-th noisiest data point.
arXiv Detail & Related papers (2020-04-20T18:37:43Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.