Learning multivariate Gaussians with imperfect advice
- URL: http://arxiv.org/abs/2411.12700v3
- Date: Fri, 31 Jan 2025 18:41:29 GMT
- Title: Learning multivariate Gaussians with imperfect advice
- Authors: Arnab Bhattacharyya, Davin Choo, Philips George John, Themis Gouleakis,
- Abstract summary: We revisit the problem of distribution learning within the framework of learning-augmented algorithms.<n>Our objective is to develop learning algorithms whose sample complexity decreases as the quality of the advice improves.
- Score: 14.459107410136486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We revisit the problem of distribution learning within the framework of learning-augmented algorithms. In this setting, we explore the scenario where a probability distribution is provided as potentially inaccurate advice on the true, unknown distribution. Our objective is to develop learning algorithms whose sample complexity decreases as the quality of the advice improves, thereby surpassing standard learning lower bounds when the advice is sufficiently accurate. Specifically, we demonstrate that this outcome is achievable for the problem of learning a multivariate Gaussian distribution $N(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ in the PAC learning setting. Classically, in the advice-free setting, $\tilde{\Theta}(d^2/\varepsilon^2)$ samples are sufficient and worst case necessary to learn $d$-dimensional Gaussians up to TV distance $\varepsilon$ with constant probability. When we are additionally given a parameter $\tilde{\boldsymbol{\Sigma}}$ as advice, we show that $\tilde{O}(d^{2-\beta}/\varepsilon^2)$ samples suffices whenever $\| \tilde{\boldsymbol{\Sigma}}^{-1/2} \boldsymbol{\Sigma} \tilde{\boldsymbol{\Sigma}}^{-1/2} - \boldsymbol{I_d} \|_1 \leq \varepsilon d^{1-\beta}$ (where $\|\cdot\|_1$ denotes the entrywise $\ell_1$ norm) for any $\beta > 0$, yielding a polynomial improvement over the advice-free setting.
Related papers
- Product distribution learning with imperfect advice [16.179400847403446]
Given i.i.d.samples from an unknown distribution $P$, the goal of distribution learning is to recover the parameters of a distribution that is close to $P$.<n>We show that there is an efficient algorithm to learn $P$ within TV distance $varepsilon$ that has sample complexity $tildeO(d1-/varepsilon2)$.
arXiv Detail & Related papers (2025-11-13T14:44:46Z) - Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination [65.37519531362157]
We show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least $tildeOmega(d1/2/alpha2)$.
arXiv Detail & Related papers (2025-10-12T15:42:44Z) - Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes [0.5892638927736114]
We prove an algorithm exists that, with high probability, outputs a simplex within $ell$ or total variation (TV) distance at most $varepsilon$ from the true simplex.<n>In the noiseless scenario, our lower bound $n ge Omega(K/varepsilon)$ matches known upper bounds up to constant factors.
arXiv Detail & Related papers (2025-06-11T18:35:38Z) - Mean and Variance Estimation Complexity in Arbitrary Distributions via Wasserstein Minimization [0.0]
This paper focuses on the complexity of estimating translation translation $boldsymbolmu in mathbbRl$ and shrinkage $sigma in mathbbR_++$ parameters.
We highlight that while the problem is NP-hard for Maximum Likelihood Estimation (MLE), it is possible to obtain $varepsilon$-approxs for arbitrary $varepsilon > 0$ within $textpoly left( frac1varepsilon )$ time using the
arXiv Detail & Related papers (2025-01-17T13:07:52Z) - Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models [37.42736399673992]
A single-index model (SIM) is a function of the form $sigma(mathbfwast cdot mathbfx)$, where $sigma: mathbbR to mathbbR$ is a known link function and $mathbfwast$ is a hidden unit vector.
We show that a proper learner attains $L2$-error of $O(mathrmOPT)+epsilon$, where $
arXiv Detail & Related papers (2024-11-08T17:10:38Z) - Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Efficient Statistics With Unknown Truncation, Polynomial Time Algorithms, Beyond Gaussians [7.04316974339151]
We study the estimation of distributional parameters when samples are shown only if they fall in some unknown set.
We develop tools that may be of independent interest, including a reduction from PAC learning with positive and unlabeled samples to PAC learning with positive and negative samples.
arXiv Detail & Related papers (2024-10-02T15:21:07Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$<n>We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Statistically Optimal Robust Mean and Covariance Estimation for
Anisotropic Gaussians [3.5788754401889014]
In the strong $varepsilon$-contamination model we assume that the adversary replaced an $varepsilon$ fraction of vectors in the original Gaussian sample by any other vectors.
We construct an estimator $widehat Sigma of the cofrac matrix $Sigma that satisfies, with probability at least $1 - delta.
arXiv Detail & Related papers (2023-01-21T23:28:55Z) - Memory-Sample Lower Bounds for Learning Parity with Noise [2.724141845301679]
We show, for the well-studied problem of learning parity under noise, that any learning algorithm requires either a memory of size $Omega(n2/varepsilon)$ or an exponential number of samples.
Our proof is based on adapting the arguments in [Raz'17,GRT'18] to the noisy case.
arXiv Detail & Related papers (2021-07-05T23:34:39Z) - Self-training Converts Weak Learners to Strong Learners in Mixture
Models [86.7137362125503]
We show that a pseudolabeler $boldsymbolbeta_mathrmpl$ can achieve classification error at most $C_mathrmerr$.
We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $boldsymbolbeta_mathrmpl$ with classification error $C_mathrmerr$ using only $O(d)$ labeled examples.
arXiv Detail & Related papers (2021-06-25T17:59:16Z) - Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication
Time [14.990725929840892]
We show an algorithm that runs in time $widetildeO(T(N, d) log kappa / mathrmpoly (varepsilon))$, where $T(N, d)$ is the time it takes to multiply a $d times N$ matrix by its transpose.
Our runtime matches that of the fastest algorithm for covariance estimation without outliers, up to poly-logarithmic factors.
arXiv Detail & Related papers (2020-06-23T20:21:27Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.