Solving Quadratic Systems with Full-Rank Matrices Using Sparse or
Generative Priors
- URL: http://arxiv.org/abs/2309.09032v1
- Date: Sat, 16 Sep 2023 16:00:07 GMT
- Title: Solving Quadratic Systems with Full-Rank Matrices Using Sparse or
Generative Priors
- Authors: Junren Chen, Shuai Huang, Michael K. Ng, Zhaoqiang Liu
- Abstract summary: This paper addresses the problem of recovering a signal from a quadratic system $y_i=boldsymbolxtopboldsymbolA_iboldsymbolx.
We introduce the thresholded Wirtinger flow (TWF) algorithm that does not require the sparsity level $k$.
We show that our approach for the sparse case notably outperforms the existing provable algorithm power factorization.
- Score: 44.94521933974231
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of recovering a signal $\boldsymbol{x} \in \mathbb{R}^n$ from a
quadratic system $\{y_i=\boldsymbol{x}^\top\boldsymbol{A}_i\boldsymbol{x},\
i=1,\ldots,m\}$ with full-rank matrices $\boldsymbol{A}_i$ frequently arises in
applications such as unassigned distance geometry and sub-wavelength imaging.
With i.i.d. standard Gaussian matrices $\boldsymbol{A}_i$, this paper addresses
the high-dimensional case where $m\ll n$ by incorporating prior knowledge of
$\boldsymbol{x}$. First, we consider a $k$-sparse $\boldsymbol{x}$ and
introduce the thresholded Wirtinger flow (TWF) algorithm that does not require
the sparsity level $k$. TWF comprises two steps: the spectral initialization
that identifies a point sufficiently close to $\boldsymbol{x}$ (up to a sign
flip) when $m=O(k^2\log n)$, and the thresholded gradient descent (with a good
initialization) that produces a sequence linearly converging to
$\boldsymbol{x}$ with $m=O(k\log n)$ measurements. Second, we explore the
generative prior, assuming that $\boldsymbol{x}$ lies in the range of an
$L$-Lipschitz continuous generative model with $k$-dimensional inputs in an
$\ell_2$-ball of radius $r$. We develop the projected gradient descent (PGD)
algorithm that also comprises two steps: the projected power method that
provides an initial vector with $O\big(\sqrt{\frac{k \log L}{m}}\big)$
$\ell_2$-error given $m=O(k\log(Lnr))$ measurements, and the projected gradient
descent that refines the $\ell_2$-error to $O(\delta)$ at a geometric rate when
$m=O(k\log\frac{Lrn}{\delta^2})$. Experimental results corroborate our
theoretical findings and show that: (i) our approach for the sparse case
notably outperforms the existing provable algorithm sparse power factorization;
(ii) leveraging the generative prior allows for precise image recovery in the
MNIST dataset from a small number of quadratic measurements.
Related papers
- A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Randomized and Deterministic Attention Sparsification Algorithms for
Over-parameterized Feature Dimension [18.57735939471469]
We consider the sparsification of the attention problem.
For any super large feature dimension, we can reduce it down to the size nearly linear in length of sentence.
arXiv Detail & Related papers (2023-04-10T05:52:38Z) - A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee [16.409210914237086]
Given a matrix $Ain mathbbRntimes d$ and a tensor $bin mathbbRn$, we consider the regression problem with $ell_infty$ guarantees.
We show that in order to obtain such $ell_infty$ guarantee for $ell$ regression, one has to use sketching matrices that are dense.
We also develop a novel analytical framework for $ell_infty$ guarantee regression that utilizes the Oblivious Coordinate-wise Embedding (OCE) property
arXiv Detail & Related papers (2023-02-01T05:22:40Z) - Misspecified Phase Retrieval with Generative Priors [15.134280834597865]
We estimate an $n$-dimensional signal $mathbfx$ from $m$ i.i.d.realizations of the single index model $y.
We show that both steps enjoy a statistical rate of order $sqrt(klog L)cdot (log m)/m$ under suitable conditions.
arXiv Detail & Related papers (2022-10-11T16:04:11Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - High-dimensional Asymptotics of Feature Learning: How One Gradient Step
Improves the Representation [89.21686761957383]
We study the first gradient descent step on the first-layer parameters $boldsymbolW$ in a two-layer network.
Our results demonstrate that even one step can lead to a considerable advantage over random features.
arXiv Detail & Related papers (2022-05-03T12:09:59Z) - Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z) - Optimal Combination of Linear and Spectral Estimators for Generalized
Linear Models [59.015960528781115]
We show how to optimally combine $hatboldsymbol xrm L$ and $hatboldsymbol xrm s$.
In order to establish the limiting distribution of $(boldsymbol x, hatboldsymbol xrm L, hatboldsymbol xrm s)$, we design and analyze an Approximate Message Passing (AMP) algorithm.
arXiv Detail & Related papers (2020-08-07T18:20:05Z) - Tree-Projected Gradient Descent for Estimating Gradient-Sparse
Parameters on Graphs [10.846572437131872]
We study estimation of a gradient-sparse parameter vector $boldsymboltheta* in mathbbRp$.
We show that, under suitable restricted strong convexity and smoothness assumptions for the loss, the resulting estimator achieves the squared-error risk $fracs*n log (1+fracps*)$ up to a multiplicative constant that is independent of $G$.
arXiv Detail & Related papers (2020-05-31T20:08:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.