Efficient List-Decodable Regression using Batches
- URL: http://arxiv.org/abs/2211.12743v1
- Date: Wed, 23 Nov 2022 07:08:00 GMT
- Title: Efficient List-Decodable Regression using Batches
- Authors: Abhimanyu Das, Ayush Jain, Weihao Kong and Rajat Sen
- Abstract summary: We begin the study of list-decodable linear regression using batches.
In this setting only an $alpha in (0,1]$ fraction of the batches are genuine.
Each genuine batch contains $ge n$ i.i.d samples from a common unknown distribution and the remaining batches may contain arbitrary or even adversarial samples.
- Score: 33.300073775123835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We begin the study of list-decodable linear regression using batches. In this
setting only an $\alpha \in (0,1]$ fraction of the batches are genuine. Each
genuine batch contains $\ge n$ i.i.d. samples from a common unknown
distribution and the remaining batches may contain arbitrary or even
adversarial samples. We derive a polynomial time algorithm that for any $n\ge
\tilde \Omega(1/\alpha)$ returns a list of size $\mathcal O(1/\alpha^2)$ such
that one of the items in the list is close to the true regression parameter.
The algorithm requires only $\tilde{\mathcal{O}}(d/\alpha^2)$ genuine batches
and works under fairly general assumptions on the distribution.
The results demonstrate the utility of batch structure, which allows for the
first polynomial time algorithm for list-decodable regression, which may be
impossible for the non-batch setting, as suggested by a recent SQ lower bound
\cite{diakonikolas2021statistical} for the non-batch setting.
Related papers
- Turnstile $\ell_p$ leverage score sampling with applications [56.403488578703865]
We develop a novel algorithm for sampling rows $a_i$ of a matrix $AinmathbbRntimes d$, proportional to their $ell_p$ norm, when $A$ is presented in a turnstile data stream.
Our algorithm not only returns the set of sampled row indexes, it also returns slightly perturbed rows $tildea_i approx a_i$, and approximates their sampling probabilities up to $varepsilon$ relative error.
For logistic regression, our framework yields the first algorithm that achieves a $
arXiv Detail & Related papers (2024-06-01T07:33:41Z) - Provably Efficient High-Dimensional Bandit Learning with Batched
Feedbacks [93.00280593719513]
We study high-dimensional multi-armed contextual bandits with batched feedback where the $T$ steps of online interactions are divided into $L$ batches.
In specific, each batch collects data according to a policy that depends on previous batches and the rewards are revealed only at the end of the batch.
Our algorithm achieves regret bounds comparable to those in fully sequential setting with only $mathcalO( log T)$ batches.
arXiv Detail & Related papers (2023-11-22T06:06:54Z) - List-Decodable Covariance Estimation [1.9290392443571387]
We give the first time algorithm for emphlist-decodable covariance estimation.
Our result implies the first-time emphexact algorithm for list-decodable linear regression and subspace recovery.
arXiv Detail & Related papers (2022-06-22T09:38:06Z) - List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering [42.526664955704746]
We develop a novel, conceptually simpler technique for list-decodable sparse mean estimation.
In particular, for distributions with "certifiably bounded" $t-th moments in $k$-sparse directions, our algorithm achieves error of $(1/alpha)O (1/t)$ with sample complexity $m = (klog(n))O(t)/alpha(mnt)$.
For the special case of Gaussian inliers, our algorithm achieves the optimal error guarantee of $Theta (sqrtlog
arXiv Detail & Related papers (2022-06-10T17:38:18Z) - Statistical Query Lower Bounds for List-Decodable Linear Regression [55.06171096484622]
We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples.
Our main result is a Statistical Query (SQ) lower bound of $dmathrmpoly (1/alpha)$ for this problem.
arXiv Detail & Related papers (2021-06-17T17:45:21Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z) - Conditional Uncorrelation and Efficient Non-approximate Subset Selection
in Sparse Regression [72.84177488527398]
We consider sparse regression from the view of correlation, and propose the formula of conditional uncorrelation.
By the proposed method, the computational complexity is reduced from $O(frac16k3+mk2+mkd)$ to $O(frac16k3+frac12mk2)$ for each candidate subset in sparse regression.
arXiv Detail & Related papers (2020-09-08T20:32:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.