Lifting uniform learners via distributional decomposition
- URL: http://arxiv.org/abs/2303.16208v2
- Date: Thu, 30 Mar 2023 00:50:26 GMT
- Title: Lifting uniform learners via distributional decomposition
- Authors: Guy Blanc, Jane Lange, Ali Malik, Li-Yang Tan
- Abstract summary: We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution $mathcalD$.
A key technical ingredient is an algorithm which, given the aforementioned access to $mathcalD$, produces an optimal decision tree decomposition of $mathcalD$.
- Score: 17.775217381568478
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We show how any PAC learning algorithm that works under the uniform
distribution can be transformed, in a blackbox fashion, into one that works
under an arbitrary and unknown distribution $\mathcal{D}$. The efficiency of
our transformation scales with the inherent complexity of $\mathcal{D}$,
running in $\mathrm{poly}(n, (md)^d)$ time for distributions over $\{\pm 1\}^n$
whose pmfs are computed by depth-$d$ decision trees, where $m$ is the sample
complexity of the original algorithm. For monotone distributions our
transformation uses only samples from $\mathcal{D}$, and for general ones it
uses subcube conditioning samples.
A key technical ingredient is an algorithm which, given the aforementioned
access to $\mathcal{D}$, produces an optimal decision tree decomposition of
$\mathcal{D}$: an approximation of $\mathcal{D}$ as a mixture of uniform
distributions over disjoint subcubes. With this decomposition in hand, we run
the uniform-distribution learner on each subcube and combine the hypotheses
using the decision tree. This algorithmic decomposition lemma also yields new
algorithms for learning decision tree distributions with runtimes that
exponentially improve on the prior state of the art -- results of independent
interest in distribution learning.
Related papers
- Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Learning general Gaussian mixtures with efficient score matching [16.06356123715737]
We study the problem of learning mixtures of $k$ Gaussians in $d$ dimensions.
We make no separation assumptions on the underlying mixture components.
We give an algorithm that draws $dmathrmpoly(k/varepsilon)$ samples from the target mixture, runs in sample-polynomial time, and constructs a sampler.
arXiv Detail & Related papers (2024-04-29T17:30:36Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Replicable Clustering [57.19013971737493]
We propose algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their counterparts in a black-box manner.
We also provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
arXiv Detail & Related papers (2023-02-20T23:29:43Z) - Stochastic Approximation Approaches to Group Distributionally Robust
Optimization [96.26317627118912]
Group distributionally robust optimization (GDRO)
Online learning techniques to reduce the number of samples required in each round from $m$ to $1$, keeping the same sample.
A novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - Perfect Sampling from Pairwise Comparisons [26.396901523831534]
We study how to efficiently obtain perfect samples from a discrete distribution $mathcalD$ given access only to pairwise comparisons of elements of its support.
We design a Markov chain whose stationary distribution coincides with $mathcalD$ and give an algorithm to obtain exact samples using the technique of Coupling from the Past.
arXiv Detail & Related papers (2022-11-23T11:20:30Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z) - Learning and Testing Junta Distributions with Subcube Conditioning [11.464622619555222]
We study the problems of learning and testing junta distributions on $-1,1n$ with respect to the uniform distribution.
The main contribution is an algorithm for finding relevant coordinates in a $k$-junta distribution with subcube conditioning.
arXiv Detail & Related papers (2020-04-26T22:52:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.