Robust density estimation over star-shaped density classes
- URL: http://arxiv.org/abs/2501.10025v1
- Date: Fri, 17 Jan 2025 08:24:30 GMT
- Title: Robust density estimation over star-shaped density classes
- Authors: Xiaolong Liu, Matey Neykov,
- Abstract summary: We construct an algorithm to construct a density estimator within a star-shaped density class, $mathcalF$.
We assume that a fraction $epsilon leq frac13$ of the $N$ observations are arbitrarily corrupted.
Under certain conditions, we obtain the minimax risk, up to proportionality to constants, as $$ maxleft tau*2 wedge d2, epsilon wedge d2 right, $$ where $tau*
- Score: 9.694065209325412
- License:
- Abstract: We establish a novel criterion for comparing the performance of two densities, $g_1$ and $g_2$, within the context of corrupted data. Utilizing this criterion, we propose an algorithm to construct a density estimator within a star-shaped density class, $\mathcal{F}$, under conditions of data corruption. We proceed to derive the minimax upper and lower bounds for density estimation across this star-shaped density class, characterized by densities that are uniformly bounded above and below (in the sup norm), in the presence of adversarially corrupted data. Specifically, we assume that a fraction $\epsilon \leq \frac{1}{3}$ of the $N$ observations are arbitrarily corrupted. We obtain the minimax upper bound $\max\{ \tau_{\overline{J}}^2, \epsilon \} \wedge d^2$. Under certain conditions, we obtain the minimax risk, up to proportionality constants, under the squared $L_2$ loss as $$ \max\left\{ \tau^{*2} \wedge d^2, \epsilon \wedge d^2 \right\}, $$ where $\tau^* := \sup\left\{ \tau : N\tau^2 \leq \log \mathcal{M}_{\mathcal{F}}^{\text{loc}}(\tau, c) \right\}$ for a sufficiently large constant $c$. Here, $\mathcal{M}_{\mathcal{F}}^{\text{loc}}(\tau, c)$ denotes the local entropy of the set $\mathcal{F}$, and $d$ is the $L_2$ diameter of $\mathcal{F}$.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Estimating the Mixing Coefficients of Geometrically Ergodic Markov
Processes [5.00389879175348]
We estimate the individual $beta$-mixing coefficients of a real-valued geometrically ergodic Markov process from a single sample-path.
Naturally no density assumptions are required in this setting; the expected error rate is shown to be of order $mathcal O(log(n) n-1/2)$.
arXiv Detail & Related papers (2024-02-11T20:17:10Z) - On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - The Local Landscape of Phase Retrieval Under Limited Samples [12.366532279123359]
We aim to ascertain the minimal sample size required to guarantee a benign local landscape surrounding global minima in high dimensions.
We first explore the local convexity of whenn=odlog d).
arXiv Detail & Related papers (2023-11-26T07:22:35Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Nonasymptotic one-and two-sample tests in high dimension with unknown
covariance structure [0.0]
We consider the problem of testing if $mu is $eta-close to zero, i.e. $|mu| leq eta against $|mu| geq (eta + delta)$.
The aim of this paper is to obtain nonasymptotic upper and lower bounds on the minimal separation distancedelta$ such that we can control both the Type I and Type II errors at a given level.
arXiv Detail & Related papers (2021-09-01T06:22:53Z) - Non-Parametric Estimation of Manifolds from Noisy Data [1.0152838128195467]
We consider the problem of estimating a $d$ dimensional sub-manifold of $mathbbRD$ from a finite set of noisy samples.
We show that the estimation yields rates of convergence of $n-frack2k + d$ for the point estimation and $n-frack-12k + d$ for the estimation of tangent space.
arXiv Detail & Related papers (2021-05-11T02:29:33Z) - Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z) - Analysis of KNN Density Estimation [56.29748742084386]
kNN density estimation is minimax optimal under both $ell_infty$ and $ell_infty$ criteria, if the support set is known.
The $ell_infty$ error does not reach the minimax lower bound, but is better than kernel density estimation.
arXiv Detail & Related papers (2020-09-30T03:33:17Z) - Optimal Coreset for Gaussian Kernel Density Estimation [0.8376091455761259]
Given a point set $Psubset mathbbRd$, the kernel density estimate of $P$ is defined as [ overlinemathcalG_P(x) = frac1left|Pright|sum_pin Pe-leftlVert x-p rightrVert2 ] for any $xinmathbbRd$.
We study how to construct a small subset $Q$ of $P
arXiv Detail & Related papers (2020-07-15T22:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.