Learning Structured Distributions From Untrusted Batches: Faster and
Simpler
- URL: http://arxiv.org/abs/2002.10435v2
- Date: Sun, 7 Jun 2020 17:50:33 GMT
- Title: Learning Structured Distributions From Untrusted Batches: Faster and
Simpler
- Authors: Sitan Chen, Jerry Li, Ankur Moitra
- Abstract summary: We revisit the problem of learning from untrusted batches introduced by Qiao and Valiant [QV17].
We find an appealing way to synthesize the techniques of [JO19] and [CLM19] to give the best of both worlds: an algorithm which runs in monotone time and can exploit structure in the underlying distribution to achieve sublinear sample.
- Score: 26.57313255285721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We revisit the problem of learning from untrusted batches introduced by Qiao
and Valiant [QV17]. Recently, Jain and Orlitsky [JO19] gave a simple
semidefinite programming approach based on the cut-norm that achieves
essentially information-theoretically optimal error in polynomial time.
Concurrently, Chen et al. [CLM19] considered a variant of the problem where
$\mu$ is assumed to be structured, e.g. log-concave, monotone hazard rate,
$t$-modal, etc. In this case, it is possible to achieve the same error with
sample complexity sublinear in $n$, and they exhibited a quasi-polynomial time
algorithm for doing so using Haar wavelets.
In this paper, we find an appealing way to synthesize the techniques of
[JO19] and [CLM19] to give the best of both worlds: an algorithm which runs in
polynomial time and can exploit structure in the underlying distribution to
achieve sublinear sample complexity. Along the way, we simplify the approach of
[JO19] by avoiding the need for SDP rounding and giving a more direct
interpretation of it through the lens of soft filtering, a powerful recent
technique in high-dimensional robust estimation. We validate the usefulness of
our algorithms in preliminary experimental evaluations.
Related papers
- A Sample Efficient Alternating Minimization-based Algorithm For Robust Phase Retrieval [56.67706781191521]
In this work, we present a robust phase retrieval problem where the task is to recover an unknown signal.
Our proposed oracle avoids the need for computationally spectral descent, using a simple gradient step and outliers.
arXiv Detail & Related papers (2024-09-07T06:37:23Z) - Forster Decomposition and Learning Halfspaces with Noise [60.691817861402676]
A Forster transform is an operation that turns a distribution into one with good anti-concentration properties.
We show that any distribution can be decomposed efficiently as a disjoint mixture of few distributions for which a Forster transform exists and can be computed efficiently.
arXiv Detail & Related papers (2021-07-12T17:00:59Z) - Rejection sampling from shape-constrained distributions in sublinear
time [14.18847457501901]
We study the query complexity of rejection sampling in a minimax framework for various classes of discrete distributions.
Our results provide new algorithms for sampling whose complexity scales sublinearly with the alphabet size.
arXiv Detail & Related papers (2021-05-29T01:00:42Z) - Robust Learning of Fixed-Structure Bayesian Networks in Nearly-Linear
Time [13.617081331738056]
We study the problem of learning Bayesian networks where an $epsilon$-fraction of the samples are adversarially corrupted.
We present the first nearly-linear time algorithm for this problem with a dimension-independent error guarantee.
arXiv Detail & Related papers (2021-05-12T10:11:32Z) - Faster Differentially Private Samplers via R\'enyi Divergence Analysis
of Discretized Langevin MCMC [35.050135428062795]
Langevin dynamics-based algorithms offer much faster alternatives under some distance measures such as statistical distance.
Our techniques simple and generic and apply to underdamped Langevin dynamics.
arXiv Detail & Related papers (2020-10-27T22:52:45Z) - A Polynomial Time Algorithm for Learning Halfspaces with Tsybakov Noise [55.45544638410858]
We study the problem of PAC learning homogeneous halfspaces in the presence of Tsybakov noise.
Our algorithm learns the true halfspace within any accuracy $epsilon$.
arXiv Detail & Related papers (2020-10-04T22:19:06Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Quasi-Newton Solver for Robust Non-Rigid Registration [35.66014845211251]
We propose a formulation for robust non-rigid registration based on a globally smooth robust estimator for data fitting and regularization.
We apply the majorization-minimization algorithm to the problem, which reduces each iteration to solving a simple least-squares problem with L-BFGS.
arXiv Detail & Related papers (2020-04-09T01:45:05Z) - SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm [64.13217062232874]
SURF is an algorithm for approximating distributions by piecewises.
It outperforms state-of-the-art algorithms in experiments.
arXiv Detail & Related papers (2020-02-22T01:03:33Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.