Optimal Combination of Linear and Spectral Estimators for Generalized
Linear Models
- URL: http://arxiv.org/abs/2008.03326v3
- Date: Fri, 25 Jun 2021 07:15:43 GMT
- Title: Optimal Combination of Linear and Spectral Estimators for Generalized
Linear Models
- Authors: Marco Mondelli, Christos Thrampoulidis and Ramji Venkataramanan
- Abstract summary: We show how to optimally combine $hatboldsymbol xrm L$ and $hatboldsymbol xrm s$.
In order to establish the limiting distribution of $(boldsymbol x, hatboldsymbol xrm L, hatboldsymbol xrm s)$, we design and analyze an Approximate Message Passing (AMP) algorithm.
- Score: 59.015960528781115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of recovering an unknown signal $\boldsymbol x$ given
measurements obtained from a generalized linear model with a Gaussian sensing
matrix. Two popular solutions are based on a linear estimator $\hat{\boldsymbol
x}^{\rm L}$ and a spectral estimator $\hat{\boldsymbol x}^{\rm s}$. The former
is a data-dependent linear combination of the columns of the measurement
matrix, and its analysis is quite simple. The latter is the principal
eigenvector of a data-dependent matrix, and a recent line of work has studied
its performance. In this paper, we show how to optimally combine
$\hat{\boldsymbol x}^{\rm L}$ and $\hat{\boldsymbol x}^{\rm s}$. At the heart
of our analysis is the exact characterization of the joint empirical
distribution of $(\boldsymbol x, \hat{\boldsymbol x}^{\rm L}, \hat{\boldsymbol
x}^{\rm s})$ in the high-dimensional limit. This allows us to compute the
Bayes-optimal combination of $\hat{\boldsymbol x}^{\rm L}$ and
$\hat{\boldsymbol x}^{\rm s}$, given the limiting distribution of the signal
$\boldsymbol x$. When the distribution of the signal is Gaussian, then the
Bayes-optimal combination has the form $\theta\hat{\boldsymbol x}^{\rm
L}+\hat{\boldsymbol x}^{\rm s}$ and we derive the optimal combination
coefficient. In order to establish the limiting distribution of $(\boldsymbol
x, \hat{\boldsymbol x}^{\rm L}, \hat{\boldsymbol x}^{\rm s})$, we design and
analyze an Approximate Message Passing (AMP) algorithm whose iterates give
$\hat{\boldsymbol x}^{\rm L}$ and approach $\hat{\boldsymbol x}^{\rm s}$.
Numerical simulations demonstrate the improvement of the proposed combination
with respect to the two methods considered separately.
Related papers
- Solving Quadratic Systems with Full-Rank Matrices Using Sparse or Generative Priors [33.0212223058894]
The problem of recovering a signal from a quadratic system $y_i=boldsymbol xtopboldsymbol A_iboldsymbol x, i=1,ldots,m$ with full-rank matrices $boldsymbol A_i$ frequently arises in applications such as unassigned distance geometry and sub-wavelength imaging.
This paper addresses the high-dimensional case where $mll n$ incorporating by prior knowledge of $boldsymbol x$.
arXiv Detail & Related papers (2023-09-16T16:00:07Z) - Max-Margin Token Selection in Attention Mechanism [34.11955962489916]
We prove that running gradient descent on $boldsymbolp$ or equivalently $boldW$ converges in to a max-margin solution.
Remarkably, our results are applicable to general data and precisely as an optimal token selection.
arXiv Detail & Related papers (2023-06-23T16:35:46Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Adversarial Examples in Random Neural Networks with General Activations [14.12513604585194]
adversarial examples are ubiquitous in two-layers networks with sub-exponential width and ReLU or smooth activations.
We show that an adversarial example $boldsymbol x'$ can be found with high probability along the direction of the gradient.
arXiv Detail & Related papers (2022-03-31T17:36:15Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Fast Graph Sampling for Short Video Summarization using Gershgorin Disc
Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling.
Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Self-training Converts Weak Learners to Strong Learners in Mixture
Models [86.7137362125503]
We show that a pseudolabeler $boldsymbolbeta_mathrmpl$ can achieve classification error at most $C_mathrmerr$.
We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $boldsymbolbeta_mathrmpl$ with classification error $C_mathrmerr$ using only $O(d)$ labeled examples.
arXiv Detail & Related papers (2021-06-25T17:59:16Z) - Extensions to the Proximal Distance Method of Constrained Optimization [7.813460653362097]
We study the problem of minimizing a loss $f(boldsymbolx)$ subject to constraints of the form $boldsymbolDboldsymbolx in S$.
Fusion constraints can capture smoothness, sparsity, or more general constraint patterns.
arXiv Detail & Related papers (2020-09-02T03:32:41Z) - The generalization error of max-margin linear classifiers: Benign
overfitting and high dimensional asymptotics in the overparametrized regime [11.252856459394854]
Modern machine learning classifiers often exhibit vanishing classification error on the training set.
Motivated by these phenomena, we revisit high-dimensional maximum margin classification for linearly separable data.
arXiv Detail & Related papers (2019-11-05T00:15:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.