Related papers: Learning and Testing Junta Distributions with Subcube Conditioning

Learning and Testing Junta Distributions with Subcube Conditioning

URL: http://arxiv.org/abs/2004.12496v1
Date: Sun, 26 Apr 2020 22:52:53 GMT
Title: Learning and Testing Junta Distributions with Subcube Conditioning
Authors: Xi Chen, Rajesh Jayaram, Amit Levi, Erik Waingarten
Abstract summary: We study the problems of learning and testing junta distributions on $-1,1n$ with respect to the uniform distribution. The main contribution is an algorithm for finding relevant coordinates in a $k$-junta distribution with subcube conditioning.
Score: 11.464622619555222
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problems of learning and testing junta distributions on $\{-1,1\}^n$ with respect to the uniform distribution, where a distribution $p$ is a $k$-junta if its probability mass function $p(x)$ depends on a subset of at most $k$ variables. The main contribution is an algorithm for finding relevant coordinates in a $k$-junta distribution with subcube conditioning [BC18, CCKLW20]. We give two applications: 1. An algorithm for learning $k$-junta distributions with $\tilde{O}(k/\epsilon^2) \log n + O(2^k/\epsilon^2)$ subcube conditioning queries, and 2. An algorithm for testing $k$-junta distributions with $\tilde{O}((k + \sqrt{n})/\epsilon^2)$ subcube conditioning queries. All our algorithms are optimal up to poly-logarithmic factors. Our results show that subcube conditioning, as a natural model for accessing high-dimensional distributions, enables significant savings in learning and testing junta distributions compared to the standard sampling model. This addresses an open question posed by Aliakbarpour, Blais, and Rubinfeld [ABR17].

Related papers

Monotonicity Testing of High-Dimensional Distributions with Subcube Conditioning [10.903818216004703]
We study monotonicity testing of high-dimensional distributions on $-1,1n$ in the model of subcube conditioning. Our work is the first to use directed isoperimetric inequalities (developed for function monotonicity testing) for analyzing a distribution testing algorithm.
arXiv Detail & Related papers (2025-02-22T21:04:22Z)
On the query complexity of sampling from non-log-concave distributions [2.4253233571593547]
We study the problem of sampling from a $d$-dimensional distribution with density $p(x)propto e-f(x)$, which does not necessarily satisfy good isoperimetric conditions. We show that for a wide range of parameters, sampling is strictly easier than optimization by a super-exponential factor in the dimension $d$.
arXiv Detail & Related papers (2025-02-10T06:54:16Z)
Sum-of-squares lower bounds for Non-Gaussian Component Analysis [33.80749804695003]
Non-Gaussian Component Analysis (NGCA) is the statistical task of finding a non-Gaussian direction in a high-dimensional dataset. Here we study the complexity of NGCA in the Sum-of-Squares framework.
arXiv Detail & Related papers (2024-10-28T18:19:13Z)
Learning Intersections of Halfspaces with Distribution Shift: Improved Algorithms and SQ Lower Bounds [9.036777309376697]
Klivans, Stavropoulos, and Vasilyan initiated the study of testable learning with distribution shift. Their model deviates from all prior work in that no assumptions are made on $mathcalD'$.
arXiv Detail & Related papers (2024-04-02T23:34:39Z)
Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$. Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z)
Near-Optimal Degree Testing for Bayes Nets [6.814860712547177]
We show that the sample of complexity of the problem is $tildeTheta (2n/2/varepsilon2)$. Our algorithm relies on a testing-by-learning framework, previously used to obtain sample-optimal testers.
arXiv Detail & Related papers (2023-04-13T03:48:31Z)
Lifting uniform learners via distributional decomposition [17.775217381568478]
We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution $mathcalD$. A key technical ingredient is an algorithm which, given the aforementioned access to $mathcalD$, produces an optimal decision tree decomposition of $mathcalD$.
arXiv Detail & Related papers (2023-03-27T23:55:25Z)
TURF: A Two-factor, Universal, Robust, Fast Distribution Learning Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$. We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z)
Statistically Near-Optimal Hypothesis Selection [33.83129262033921]
We derive an optimal $2$-approximation learning strategy for the Hypothesis Selection problem. This is the first algorithm that simultaneously achieves the best approximation factor and sample complexity.
arXiv Detail & Related papers (2021-08-17T21:11:20Z)
The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$. In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples. We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z)
Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime. We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy. For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution. We show that the constraint of local differential privacy incurs an exponential increase in cost. Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging. We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.