Related papers: Ratio Covers of Convex Sets and Optimal Mixture Density Estimation

Ratio Covers of Convex Sets and Optimal Mixture Density Estimation

URL: http://arxiv.org/abs/2602.16142v1
Date: Wed, 18 Feb 2026 02:25:18 GMT
Title: Ratio Covers of Convex Sets and Optimal Mixture Density Estimation
Authors: Spencer Compton, Gábor Lugosi, Jaouad Mourtada, Jian Qian, Nikita Zhivotovskiy,
Abstract summary: We study density estimation in Kullback-Leibler divergence.<n>We identify the best possible high-probability guarantees in terms of the dictionary size, sample size, and confidence level.
Score: 16.547739992791197
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study density estimation in Kullback-Leibler divergence: given an i.i.d. sample from an unknown density $p$, the goal is to construct an estimator $\widehat p$ such that $\mathrm{KL}(p,\widehat p)$ is small with high probability. We consider two settings involving a finite dictionary of $M$ densities: (i) model aggregation, where $p$ belongs to the dictionary, and (ii) convex aggregation (mixture density estimation), where $p$ is a mixture of densities from the dictionary. Crucially, we make no assumption on the base densities: their ratios may be unbounded and their supports may differ. For both problems, we identify the best possible high-probability guarantees in terms of the dictionary size, sample size, and confidence level. These optimal rates are higher than those achievable when density ratios are bounded by absolute constants; for mixture density estimation, they match existing lower bounds in the special case of discrete distributions. Our analysis of the mixture case hinges on two new covering results. First, we provide a sharp, distribution-free upper bound on the local Hellinger entropy of the class of mixtures of $M$ distributions. Second, we prove an optimal ratio covering theorem for convex sets: for every convex compact set $K\subset \mathbb{R}_+^d$, there exists a subset $A\subset K$ with at most $2^{8d}$ elements such that each element of $K$ is coordinate-wise dominated by an element of $A$ up to a universal constant factor. This geometric result is of independent interest; notably, it yields new cardinality estimates for $\varepsilon$-approximate Pareto sets in multi-objective optimization when the attainable set of objective vectors is convex.

Related papers

SoS Certifiability of Subgaussian Distributions and its Algorithmic Applications [37.208622097149714]
We prove that there is a universal constant $C>0$ so that for every $d inmathbb N$, every centered subgaussian distribution $mathcal D$ on $mathbb Rd$, and every even $p inmathbb N$, the $d-optimal inmathbb N$, and the $d-optimal inmathbb N$. This establishes that every subgaussian distribution is emphS-certifiably subgaussian -- a condition that yields efficient learning algorithms for a wide variety of
arXiv Detail & Related papers (2024-10-28T16:36:58Z)
Semidefinite programming relaxations and debiasing for MAXCUT-based clustering [1.9761774213809036]
We consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of 2 sub-gaussian distributions in $mathbbRp$.<n>We use semidefinite programming relaxations of an integer quadratic program that is formulated as finding the maximum cut on a graph.
arXiv Detail & Related papers (2024-01-16T03:14:24Z)
Clustering Mixtures of Bounded Covariance Distributions Under Optimal Separation [44.25945344950543]
We study the clustering problem for mixtures of bounded covariance distributions. We give the first poly-time algorithm for this clustering task. Our algorithm is robust to $Omega(alpha)$-fraction of adversarial outliers.
arXiv Detail & Related papers (2023-12-19T01:01:53Z)
$L^1$ Estimation: On the Optimality of Linear Estimators [64.76492306585168]
This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian. In particular, it is demonstrated that if the conditional distribution $P_X|Y=y$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution.
arXiv Detail & Related papers (2023-09-17T01:45:13Z)
Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond [89.72693227960274]
This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions. To reduce the number of samples in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts and the other executes an online algorithm for non-oblivious multi-armed bandits. In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of distributions.
arXiv Detail & Related papers (2023-02-18T09:24:15Z)
Near-optimal fitting of ellipsoids to random points [68.12685213894112]
A basic problem of fitting an ellipsoid to random points has connections to low-rank matrix decompositions, independent component analysis, and principal component analysis. We resolve this conjecture up to logarithmic factors by constructing a fitting ellipsoid for some $n = Omega(, d2/mathrmpolylog(d),)$. Our proof demonstrates feasibility of the least squares construction of Saunderson et al. using a convenient decomposition of a certain non-standard random matrix.
arXiv Detail & Related papers (2022-08-19T18:00:34Z)
Consistent Density Estimation Under Discrete Mixture Models [20.935152220339056]
This work considers a problem of estimating a mixing probability density $f$ in the setting of discrete mixture models. In particular, it is shown that there exists an estimator $f_n$ such that for every density $f$ $lim_nto infty mathbbE left[ int |f_n -f | right]=0$.
arXiv Detail & Related papers (2021-05-03T18:30:02Z)
Analysis of KNN Density Estimation [56.29748742084386]
kNN density estimation is minimax optimal under both $ell_infty$ and $ell_infty$ criteria, if the support set is known. The $ell_infty$ error does not reach the minimax lower bound, but is better than kernel density estimation.
arXiv Detail & Related papers (2020-09-30T03:33:17Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$ We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting. We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$. Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)
Optimal estimation of high-dimensional location Gaussian mixtures [6.947889260024788]
We show that the minimax rate of estimating the mixing distribution in Wasserstein distance is $Theta((d/n)1/4 + n-1/(4k-2))$. We also show that the mixture density can be estimated at the optimal parametric rate $Theta(sqrtd/n)$ in Hellinger distance.
arXiv Detail & Related papers (2020-02-14T00:11:54Z)
Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging. We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.