Related papers: Optimal Bound for PCA with Outliers using Higher-Degree Voronoi Diagrams

Optimal Bound for PCA with Outliers using Higher-Degree Voronoi Diagrams

URL: http://arxiv.org/abs/2408.06867v2
Date: Mon, 19 Aug 2024 06:11:40 GMT
Title: Optimal Bound for PCA with Outliers using Higher-Degree Voronoi Diagrams
Authors: Sajjad Hashemian, Mohammad Saeed Arvenaghi, Ebrahim Ardeshir-Larijani,
Abstract summary: We introduce new algorithms for Principal Component Analysis (PCA) with outliers. We navigate to the optimal subspace for PCA even in the presence of outliers. This approach achieves an optimal solution with a time complexity of $nd+mathcalO(1)textpoly(n,d)$.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we introduce new algorithms for Principal Component Analysis (PCA) with outliers. Utilizing techniques from computational geometry, specifically higher-degree Voronoi diagrams, we navigate to the optimal subspace for PCA even in the presence of outliers. This approach achieves an optimal solution with a time complexity of $n^{d+\mathcal{O}(1)}\text{poly}(n,d)$. Additionally, we present a randomized algorithm with a complexity of $2^{\mathcal{O}(r(d-r))} \times \text{poly}(n, d)$. This algorithm samples subspaces characterized in terms of a Grassmannian manifold. By employing such sampling method, we ensure a high likelihood of capturing the optimal subspace, with the success probability $(1 - \delta)^T$. Where $\delta$ represents the probability that a sampled subspace does not contain the optimal solution, and $T$ is the number of subspaces sampled, proportional to $2^{r(d-r)}$. Our use of higher-degree Voronoi diagrams and Grassmannian based sampling offers a clearer conceptual pathway and practical advantages, particularly in handling large datasets or higher-dimensional settings.

Related papers

Combinatorial Sparse PCA Beyond the Spiked Identity Model [25.073817053937802]
We present a method for sparse PCA that provably succeeds for general $$ using $s2 cdot mathrmpolylog(d)$ samples and $d2 cdot mathrmpoly(s)$ time.<n>We also evaluate our method on synthetic and real-world datasets.
arXiv Detail & Related papers (2026-03-03T05:19:55Z)
An Iterative Algorithm for Differentially Private $k$-PCA with Adaptive Noise [8.555773470114698]
We propose an algorithm capable of estimating the top $k$ eigenvectors for arbitrary $k leq d$.<n>Our algorithm achieves near-optimal statistical error even when $n = tilde!O(d)$.
arXiv Detail & Related papers (2025-08-14T17:48:45Z)
The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models [71.5283441529015]
In this work we consider generic Gaussian Multi-index models, in which the labels only depend on the (Gaussian) $d$-dimensional inputs through their projection onto a low-dimensional $r = O_d(1)$ subspace.<n>We introduce the generative leap exponent $kstar$, a natural extension of the generative exponent from [Damian et al.'24] to the multi-index setting.
arXiv Detail & Related papers (2025-06-05T18:34:56Z)
Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z)
Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$. Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z)
Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes [21.77276136591518]
We develop provably efficient model-free reinforcement learning (RL) algorithms for Markov Decision Processes (MDPs) In the simulator setting, we propose a model-free RL algorithm that finds an $epsilon$-optimal policy using $widetildeO left(fracSAmathrmsp(h*)epsilon2+fracS2Amathrmsp(h*)epsilon2right)$ samples.
arXiv Detail & Related papers (2023-06-28T17:43:19Z)
Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs [60.40452803295326]
We propose a new reward-free algorithm for learning linear Markov decision processes (MDPs) At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward. We show that our algorithm only needs to explore $tilde O( d2varepsilon-2)$ episodes to find an $varepsilon$-optimal policy.
arXiv Detail & Related papers (2023-03-17T17:53:28Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes [61.11090361892306]
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration. We show that this separation does not exist in the setting of linear MDPs. We develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP.
arXiv Detail & Related papers (2022-01-26T22:09:59Z)
Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing [22.337756118270217]
We develop two methods for a fundamental statistical task: given an $epsilon$-corrupted set of $n$ samples from a $d$-linear sub-Gaussian distribution. Our first robust algorithm runs iterative filtering in time, returns an approximate eigenvector, and is based on a simple filtering approach. Our second, which attains a slightly worse approximation factor, runs in nearly-trivial time and iterations under a mild spectral gap assumption.
arXiv Detail & Related papers (2020-06-12T07:45:38Z)
Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise. We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z)
Proper Learning, Helly Number, and an Optimal SVM Bound [29.35254938542589]
We characterize classes for which the optimal sample complexity can be achieved by a proper learning algorithm. We show that the dual Helly number is bounded if and only if there is a proper joint dependence on $epsilon$ and $delta$.
arXiv Detail & Related papers (2020-05-24T18:11:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.