A Combinatorial Approach to Robust PCA
- URL: http://arxiv.org/abs/2311.16416v1
- Date: Tue, 28 Nov 2023 01:49:51 GMT
- Title: A Combinatorial Approach to Robust PCA
- Authors: Weihao Kong, Mingda Qiao, Rajat Sen
- Abstract summary: We study the problem of recovering Gaussian data under adversarial corruptions.
We assume that the Gaussian noises lie in an unknown $k$-dimensional subspace $U subseteq mathbbRd$, and $s$ randomly chosen coordinates of each data point fall into the control of an adversary.
Our main result is an efficient algorithm that, when $ks2 = O(d)$, recovers every single data point up to a nearly-optimal error of $tilde O(ks/d)$ in expectation.
- Score: 18.740048806623037
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of recovering Gaussian data under adversarial
corruptions when the noises are low-rank and the corruptions are on the
coordinate level. Concretely, we assume that the Gaussian noises lie in an
unknown $k$-dimensional subspace $U \subseteq \mathbb{R}^d$, and $s$ randomly
chosen coordinates of each data point fall into the control of an adversary.
This setting models the scenario of learning from high-dimensional yet
structured data that are transmitted through a highly-noisy channel, so that
the data points are unlikely to be entirely clean.
Our main result is an efficient algorithm that, when $ks^2 = O(d)$, recovers
every single data point up to a nearly-optimal $\ell_1$ error of $\tilde
O(ks/d)$ in expectation. At the core of our proof is a new analysis of the
well-known Basis Pursuit (BP) method for recovering a sparse signal, which is
known to succeed under additional assumptions (e.g., incoherence or the
restricted isometry property) on the underlying subspace $U$. In contrast, we
present a novel approach via studying a natural combinatorial problem and show
that, over the randomness in the support of the sparse signal, a
high-probability error bound is possible even if the subspace $U$ is arbitrary.
Related papers
- Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size.
Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z) - Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing [47.32366699399839]
Finding an approximate secondepsilon dependence (SOSP) is a well-studied and fundamental problem.
In this paper we apply our framework to the problem of lowdimension sensing machine optimization.
arXiv Detail & Related papers (2024-03-12T01:27:44Z) - Corruption-Robust Offline Reinforcement Learning with General Function
Approximation [60.91257031278004]
We investigate the problem of corruption in offline reinforcement learning (RL) with general function approximation.
Our goal is to find a policy that is robust to such corruption and minimizes the suboptimality gap with respect to the optimal policy for the uncorrupted Markov decision processes (MDPs)
arXiv Detail & Related papers (2023-10-23T04:07:26Z) - Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA [17.75853665586128]
This paper is concerned with estimating the column subspace of a low-rank matrix $boldsymbolXstar in mathbbRn_1times n$ from contaminated data.
How to obtain optimal statistical accuracy while accommodating the widest range of signal-to-noise ratios (SNRs) becomes challenging.
arXiv Detail & Related papers (2023-03-10T20:22:18Z) - Subspace Recovery from Heterogeneous Data with Non-isotropic Noise [43.44371292901258]
We study a basic formulation of this problem: the principal component analysis (PCA)
Our goal is to recover the linear subspace shared by $mu_i$ using the data points from all users.
We design an efficiently-computable estimator under non-spherical and user-dependent noise.
arXiv Detail & Related papers (2022-10-24T18:00:08Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Minimax Rates for Robust Community Detection [19.229475414802213]
We study the problem of community detection in the block model with adversarial node corruptions.
Our main result is an efficient algorithm that can tolerate an $epsilon$-fraction of corruptions and unbounded error $O(epsilon) + e-fracC2 (1 pm o(1))$ where $C = (sqrta - sqrtb)2$ is the signal-to-noise ratio.
We show that our algorithms are doubly-robust in the sense that they work in an even more
arXiv Detail & Related papers (2022-07-25T04:45:16Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers [7.224832132296238]
We study the problem of robustly estimating the mean of a $d$-dimensional distribution given $N$ examples.
We show algorithms that estimate the mean of the distribution with information-theoretically optimal dimension-independent error guarantees in nearly-linear time.
arXiv Detail & Related papers (2020-08-18T17:53:34Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise.
We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.