Debiasing and a local analysis for population clustering using
semidefinite programming
- URL: http://arxiv.org/abs/2401.10927v1
- Date: Tue, 16 Jan 2024 03:14:24 GMT
- Title: Debiasing and a local analysis for population clustering using
semidefinite programming
- Authors: Shuheng Zhou
- Abstract summary: We consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of $2$ sub-gaussian distributions.
This work is motivated by the application of clustering individuals according to their population of origin.
- Score: 1.9761774213809036
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we consider the problem of partitioning a small data sample of
size $n$ drawn from a mixture of $2$ sub-gaussian distributions. In particular,
we analyze computational efficient algorithms proposed by the same author, to
partition data into two groups approximately according to their population of
origin given a small sample. This work is motivated by the application of
clustering individuals according to their population of origin using $p$
markers, when the divergence between any two of the populations is small. We
build upon the semidefinite relaxation of an integer quadratic program that is
formulated essentially as finding the maximum cut on a graph, where edge
weights in the cut represent dissimilarity scores between two nodes based on
their $p$ features. Here we use $\Delta^2 :=p \gamma$ to denote the $\ell_2^2$
distance between two centers (mean vectors), namely, $\mu^{(1)}$, $\mu^{(2)}$
$\in$ $\mathbb{R}^p$. The goal is to allow a full range of tradeoffs between
$n, p, \gamma$ in the sense that partial recovery (success rate $< 100\%$) is
feasible once the signal to noise ratio $s^2 := \min\{np \gamma^2, \Delta^2\}$
is lower bounded by a constant. Importantly, we prove that the
misclassification error decays exponentially with respect to the SNR $s^2$.
This result was introduced earlier without a full proof. We therefore present
the full proof in the present work. Finally, for balanced partitions, we
consider a variant of the SDP1, and show that the new estimator has a superb
debiasing property. This is novel to the best of our knowledge.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Robust Distribution Learning with Local and Global Adversarial Corruptions [17.22168727622332]
We develop an efficient finite-sample algorithm with error bounded by $sqrtvarepsilon k + rho + tildeO(dsqrtkn-1/(k lor 2))$ when $P$ has bounded covariance.
Our efficient procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator.
arXiv Detail & Related papers (2024-06-10T17:48:36Z) - Multiple-policy Evaluation via Density Estimation [30.914344538340412]
We propose an algorithm named $mathrmCAESAR$ for this problem.
Up to low order and logarithmic terms $mathrmCAESAR$ achieves a sample complexity $tildeOleft(fracH4epsilon2sum_h=1Hmax_kin[K]sum_s,afrac(d_hpik(s,a))2mu*_h(s,a)right)$, where $dpi
arXiv Detail & Related papers (2024-03-29T23:55:25Z) - Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions [11.222970035173372]
kernel-based score estimator achieves an optimal mean square error of $widetildeOleft(n-1 t-fracd+22(tfracd2 vee 1)right)
We show that a kernel-based score estimator achieves an optimal mean square error of $widetildeOleft(n-1/2 t-fracd4right)$ upper bound for the total variation error of the distribution of the sample generated by the diffusion model under a mere sub-Gaussian
arXiv Detail & Related papers (2024-02-23T20:51:31Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Semidefinite programming on population clustering: a global analysis [0.6472434306724609]
We consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of $2$ sub-gaussian distributions.
We are interested in the case that individual features are of low average quality $gamma$, and we want to use as few of them as possible to correctly partition the sample.
arXiv Detail & Related papers (2023-01-01T04:52:25Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Overparametrized linear dimensionality reductions: From projection
pursuit to two-layer neural networks [10.368585938419619]
Given a cloud of $n$ data points in $mathbbRd$, consider all projections onto $m$-dimensional subspaces of $mathbbRd$.
What does this collection of probability distributions look like when $n,d$ grow large?
Denoting by $mathscrF_m, alpha$ the set of probability distributions in $mathbbRm$ that arise as low-dimensional projections in this limit, we establish new inner and outer bounds on $mathscrF_
arXiv Detail & Related papers (2022-06-14T00:07:33Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.