Spectral Methods for Data Science: A Statistical Perspective
- URL: http://arxiv.org/abs/2012.08496v1
- Date: Tue, 15 Dec 2020 18:40:56 GMT
- Title: Spectral Methods for Data Science: A Statistical Perspective
- Authors: Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma
- Abstract summary: Spectral methods have emerged as a simple yet surprisingly effective approach for extracting information from massive, noisy and incomplete data.
This book aims to present a systematic, comprehensive, yet accessible introduction to spectral methods from a modern statistical perspective.
- Score: 37.2486912080998
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spectral methods have emerged as a simple yet surprisingly effective approach
for extracting information from massive, noisy and incomplete data. In a
nutshell, spectral methods refer to a collection of algorithms built upon the
eigenvalues (resp. singular values) and eigenvectors (resp. singular vectors)
of some properly designed matrices constructed from data. A diverse array of
applications have been found in machine learning, data science, and signal
processing. Due to their simplicity and effectiveness, spectral methods are not
only used as a stand-alone estimator, but also frequently employed to
initialize other more sophisticated algorithms to improve performance.
While the studies of spectral methods can be traced back to classical matrix
perturbation theory and methods of moments, the past decade has witnessed
tremendous theoretical advances in demystifying their efficacy through the lens
of statistical modeling, with the aid of non-asymptotic random matrix theory.
This monograph aims to present a systematic, comprehensive, yet accessible
introduction to spectral methods from a modern statistical perspective,
highlighting their algorithmic implications in diverse large-scale
applications. In particular, our exposition gravitates around several central
questions that span various applications: how to characterize the sample
efficiency of spectral methods in reaching a target level of statistical
accuracy, and how to assess their stability in the face of random noise,
missing data, and adversarial corruptions? In addition to conventional $\ell_2$
perturbation analysis, we present a systematic $\ell_{\infty}$ and
$\ell_{2,\infty}$ perturbation theory for eigenspace and singular subspaces,
which has only recently become available owing to a powerful "leave-one-out"
analysis framework.
Related papers
- Matrix Denoising with Doubly Heteroscedastic Noise: Fundamental Limits and Optimal Spectral Methods [24.06775799553418]
We study the matrix denoising problem of estimating the singular vectors of a rank-$1$ signal corrupted by noise with both column and row correlations.
Our work establishes the information-theoretic and algorithmic limits of matrix denoising with doubly heteroscedastic noise.
arXiv Detail & Related papers (2024-05-22T18:38:10Z) - Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement
Learning [53.445068584013896]
We study matrix estimation problems arising in reinforcement learning (RL) with low-rank structure.
In low-rank bandits, the matrix to be recovered specifies the expected arm rewards, and for low-rank Markov Decision Processes (MDPs), it may for example characterize the transition kernel of the MDP.
We show that simple spectral-based matrix estimation approaches efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error.
arXiv Detail & Related papers (2023-10-10T17:06:41Z) - Spectral Estimators for Structured Generalized Linear Models via Approximate Message Passing [28.91482208876914]
We consider the problem of parameter estimation in a high-dimensional generalized linear model.
Despite their wide use, a rigorous performance characterization, as well as a principled way to preprocess the data, are available only for unstructured designs.
arXiv Detail & Related papers (2023-08-28T11:49:23Z) - Precise Asymptotics for Spectral Methods in Mixed Generalized Linear Models [31.58736590532443]
We consider the problem of estimating two statistically independent signals in a mixed generalized linear model.
Our characterization exploits a mix of tools from random matrices, free probability and the theory of approximate message passing algorithms.
arXiv Detail & Related papers (2022-11-21T11:35:25Z) - Geometry of EM and related iterative algorithms [8.228889210180268]
The Expectation--Maximization (EM) algorithm is a simple meta-algorithm that has been used for many years as a methodology for statistical inference.
In this paper, we introduce the $em$ algorithm, an information geometric formulation of the EM algorithm, and its extensions and applications to various problems.
arXiv Detail & Related papers (2022-09-03T00:23:23Z) - Spectral Decomposition Representation for Reinforcement Learning [100.0424588013549]
We propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy.
A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings.
An experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.
arXiv Detail & Related papers (2022-08-19T19:01:30Z) - Probabilistic Simplex Component Analysis [66.30587591100566]
PRISM is a probabilistic simplex component analysis approach to identifying the vertices of a data-circumscribing simplex from data.
The problem has a rich variety of applications, the most notable being hyperspectral unmixing in remote sensing and non-negative matrix factorization in machine learning.
arXiv Detail & Related papers (2021-03-18T05:39:00Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Scalable Spectral Clustering with Nystrom Approximation: Practical and
Theoretical Aspects [1.6752182911522515]
This work presents a principled spectral clustering algorithm that exploits spectral properties of the similarity matrix associated with sampled points to regulate accuracy-efficiency trade-offs.
The overarching goal of this work is to provide an improved baseline for future research directions to accelerate spectral clustering.
arXiv Detail & Related papers (2020-06-25T15:10:56Z) - Spectral Learning on Matrices and Tensors [74.88243719463053]
We show that tensor decomposition can pick up latent effects that are missed by matrix methods.
We also outline computational techniques to design efficient tensor decomposition methods.
arXiv Detail & Related papers (2020-04-16T22:53:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.