Streaming Algorithms for High-Dimensional Robust Statistics
- URL: http://arxiv.org/abs/2204.12399v2
- Date: Wed, 3 May 2023 17:59:26 GMT
- Title: Streaming Algorithms for High-Dimensional Robust Statistics
- Authors: Ilias Diakonikolas, Daniel M. Kane, Ankit Pensia, Thanasis Pittas
- Abstract summary: We develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements.
Our main result is for the task of high-dimensional robust mean estimation in (a strengthening of) Huber's contamination model.
- Score: 43.106438224356175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study high-dimensional robust statistics tasks in the streaming model. A
recent line of work obtained computationally efficient algorithms for a range
of high-dimensional robust estimation tasks. Unfortunately, all previous
algorithms require storing the entire dataset, incurring memory at least
quadratic in the dimension. In this work, we develop the first efficient
streaming algorithms for high-dimensional robust statistics with near-optimal
memory requirements (up to logarithmic factors). Our main result is for the
task of high-dimensional robust mean estimation in (a strengthening of) Huber's
contamination model. We give an efficient single-pass streaming algorithm for
this task with near-optimal error guarantees and space complexity nearly-linear
in the dimension. As a corollary, we obtain streaming algorithms with
near-optimal space complexity for several more complex tasks, including robust
covariance estimation, robust regression, and more generally robust stochastic
optimization.
Related papers
- Randomized Dimension Reduction with Statistical Guarantees [0.27195102129095]
This thesis explores some of such algorithms for fast execution and efficient data utilization.
We focus on learning algorithms with various incorporations of data augmentation that improve generalization and distributional provably.
Specifically, Chapter 4 presents a sample complexity analysis for data augmentation consistency regularization.
arXiv Detail & Related papers (2023-10-03T02:01:39Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Learning the hub graphical Lasso model with the structured sparsity via
an efficient algorithm [1.0923877073891446]
We introduce a two-phase algorithm to estimate hub graphical models.
The proposed algorithm first generates a good initial point via a dual alternating direction method of multipliers.
It then warms a semismooth Newton (SSN) based augmented Lagrangian method (ALM) to compute a solution that is accurate enough for practical tasks.
arXiv Detail & Related papers (2023-08-17T08:24:28Z) - Best-Subset Selection in Generalized Linear Models: A Fast and
Consistent Algorithm via Splicing Technique [0.6338047104436422]
Best subset section has been widely regarded as the Holy Grail of problems of this type.
We proposed and illustrated an algorithm for best subset recovery in mild conditions.
Our implementation achieves approximately a fourfold speedup compared to popular variable selection toolkits.
arXiv Detail & Related papers (2023-08-01T03:11:31Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints.
We develop novel and simple optimization formulations for these problems.
As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z) - Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms.
We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically.
Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Stochastic Optimization for Regularized Wasserstein Estimators [10.194798773447879]
We introduce an algorithm to solve a regularized version of the problem of Wasserstein estimators gradient, with a time per step which is sublinear in the natural dimensions.
We show that this algorithm can be extended to other tasks, including estimation of Wasserstein barycenters.
arXiv Detail & Related papers (2020-02-20T12:04:05Z) - How to Solve Fair $k$-Center in Massive Data Models [5.3283669037198615]
We design new streaming and distributed algorithms for the fair $k$-center problem.
Our main contributions are: (a) the first distributed algorithm; and (b) a two-pass streaming algorithm with a provable approximation guarantee.
arXiv Detail & Related papers (2020-02-18T16:11:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.