A statistical perspective on algorithm unrolling models for inverse
problems
- URL: http://arxiv.org/abs/2311.06395v1
- Date: Fri, 10 Nov 2023 20:52:20 GMT
- Title: A statistical perspective on algorithm unrolling models for inverse
problems
- Authors: Yves Atchade, Xinru Liu, Qiuyun Zhu
- Abstract summary: In inverse problems where the conditional distribution of the observation $bf y$ given the latent variable of interest $bf x$ is known, we consider algorithm unrolling.
We show that the unrolling depth needed for the optimal statistical performance of GDNs is of order $log(n)/log(varrho_n-1)$, where $n$ is the sample size.
We also show that when the negative log-density of the latent variable $bf x$ has a simple proximal operator, then a GDN unrolled at depth $
- Score: 2.7163621600184777
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider inverse problems where the conditional distribution of the
observation ${\bf y}$ given the latent variable of interest ${\bf x}$ (also
known as the forward model) is known, and we have access to a data set in which
multiple instances of ${\bf x}$ and ${\bf y}$ are both observed. In this
context, algorithm unrolling has become a very popular approach for designing
state-of-the-art deep neural network architectures that effectively exploit the
forward model. We analyze the statistical complexity of the gradient descent
network (GDN), an algorithm unrolling architecture driven by proximal gradient
descent. We show that the unrolling depth needed for the optimal statistical
performance of GDNs is of order $\log(n)/\log(\varrho_n^{-1})$, where $n$ is
the sample size, and $\varrho_n$ is the convergence rate of the corresponding
gradient descent algorithm. We also show that when the negative log-density of
the latent variable ${\bf x}$ has a simple proximal operator, then a GDN
unrolled at depth $D'$ can solve the inverse problem at the parametric rate
$O(D'/\sqrt{n})$. Our results thus also suggest that algorithm unrolling models
are prone to overfitting as the unrolling depth $D'$ increases. We provide
several examples to illustrate these results.
Related papers
- Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method [10.742859956268655]
This paper aims to recover the intrinsic model parameters given the leverage scores gradient.
We specifically scrutinize the inversion of the leverage score gradient, denoted as $g(x)$.
arXiv Detail & Related papers (2024-08-21T01:39:42Z) - Transfer Learning for Latent Variable Network Models [18.31057192626801]
We study transfer learning for estimation in latent variable network models.
We show that if the latent variables are shared, then vanishing error is possible.
Our algorithm achieves $o(1)$ error and does not assume a parametric form on the source or target networks.
arXiv Detail & Related papers (2024-06-05T16:33:30Z) - Improved Analysis of Sparse Linear Regression in Local Differential
Privacy Model [38.66115499136791]
We revisit the problem of sparse linear regression in the local differential privacy (LDP) model.
We propose an innovative NLDP algorithm, the very first of its kind for the problem.
Our findings reveal fundamental differences between the non-private case, central DP model, and local DP model in the sparse linear regression problem.
arXiv Detail & Related papers (2023-10-11T10:34:52Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Robust Estimation for Random Graphs [47.07886511972046]
We study the problem of robustly estimating the parameter $p$ of an ErdHos-R'enyi random graph on $n$ nodes.
We give an inefficient algorithm with similar accuracy for all $gamma 1/2$, the information-theoretic limit.
arXiv Detail & Related papers (2021-11-09T18:43:25Z) - Hybrid Stochastic-Deterministic Minibatch Proximal Gradient:
Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
We show that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
For loss factors, we prove that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
arXiv Detail & Related papers (2020-09-18T02:18:44Z) - Convergence of Sparse Variational Inference in Gaussian Processes
Regression [29.636483122130027]
We show that a method with an overall computational cost of $mathcalO(log N)2D(loglog N)2)$ can be used to perform inference.
arXiv Detail & Related papers (2020-08-01T19:23:34Z) - Differentially Quantized Gradient Methods [53.3186247068836]
We show that Differentially Quantized Gradient Descent (DQ-GD) attains a linear contraction factor of $maxsigma_mathrmGD, rhon 2-R$.
No algorithm within a certain class can converge faster than $maxsigma_mathrmGD, 2-R$.
arXiv Detail & Related papers (2020-02-06T20:40:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.