Related papers: A statistical perspective on algorithm unrolling models for inverse problems

A statistical perspective on algorithm unrolling models for inverse problems

URL: http://arxiv.org/abs/2311.06395v1
Date: Fri, 10 Nov 2023 20:52:20 GMT
Title: A statistical perspective on algorithm unrolling models for inverse problems
Authors: Yves Atchade, Xinru Liu, Qiuyun Zhu
Abstract summary: In inverse problems where the conditional distribution of the observation $bf y$ given the latent variable of interest $bf x$ is known, we consider algorithm unrolling. We show that the unrolling depth needed for the optimal statistical performance of GDNs is of order $log(n)/log(varrho_n-1)$, where $n$ is the sample size. We also show that when the negative log-density of the latent variable $bf x$ has a simple proximal operator, then a GDN unrolled at depth $
Score: 2.7163621600184777
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider inverse problems where the conditional distribution of the observation ${\bf y}$ given the latent variable of interest ${\bf x}$ (also known as the forward model) is known, and we have access to a data set in which multiple instances of ${\bf x}$ and ${\bf y}$ are both observed. In this context, algorithm unrolling has become a very popular approach for designing state-of-the-art deep neural network architectures that effectively exploit the forward model. We analyze the statistical complexity of the gradient descent network (GDN), an algorithm unrolling architecture driven by proximal gradient descent. We show that the unrolling depth needed for the optimal statistical performance of GDNs is of order $\log(n)/\log(\varrho_n^{-1})$, where $n$ is the sample size, and $\varrho_n$ is the convergence rate of the corresponding gradient descent algorithm. We also show that when the negative log-density of the latent variable ${\bf x}$ has a simple proximal operator, then a GDN unrolled at depth $D'$ can solve the inverse problem at the parametric rate $O(D'/\sqrt{n})$. Our results thus also suggest that algorithm unrolling models are prone to overfitting as the unrolling depth $D'$ increases. We provide several examples to illustrate these results.

Related papers

Entangled Mean Estimation in High-Dimensions [36.97113089188035]
We study the task of high-dimensional entangled mean estimation in the subset-of-signals model. We show that the optimal error (up to polylogarithmic factors) is $f(alpha,N) + sqrtD/(alpha N)$, where the term $f(alpha,N)$ is the error of the one-dimensional problem and the second term is the sub-Gaussian error rate.
arXiv Detail & Related papers (2025-01-09T18:31:35Z)
Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method [10.742859956268655]
This paper aims to recover the intrinsic model parameters given the leverage scores gradient. We specifically scrutinize the inversion of the leverage score gradient, denoted as $g(x)$.
arXiv Detail & Related papers (2024-08-21T01:39:42Z)
Transfer Learning for Latent Variable Network Models [18.31057192626801]
We study transfer learning for estimation in latent variable network models. We show that if the latent variables are shared, then vanishing error is possible. Our algorithm achieves $o(1)$ error and does not assume a parametric form on the source or target networks.
arXiv Detail & Related papers (2024-06-05T16:33:30Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Matching the Statistical Query Lower Bound for $k$-Sparse Parity Problems with Sign Stochastic Gradient Descent [83.85536329832722]
We solve the $k$-sparse parity problem with sign gradient descent (SGD) on two-layer fully-connected neural networks. We show that this approach can efficiently solve the $k$-sparse parity problem on a $d$-dimensional hypercube. We then demonstrate how a trained neural network with sign SGD can effectively approximate this good network, solving the $k$-parity problem with small statistical errors.
arXiv Detail & Related papers (2024-04-18T17:57:53Z)
Improved Analysis of Sparse Linear Regression in Local Differential Privacy Model [38.66115499136791]
We revisit the problem of sparse linear regression in the local differential privacy (LDP) model. We propose an innovative NLDP algorithm, the very first of its kind for the problem. Our findings reveal fundamental differences between the non-private case, central DP model, and local DP model in the sparse linear regression problem.
arXiv Detail & Related papers (2023-10-11T10:34:52Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity. We proposed a method to solve problems master to agents' communication and local calls. The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z)
Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle. We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
Decentralized Sparse Linear Regression via Gradient-Tracking: Linear Convergence and Statistical Guarantees [23.256961881716595]
We study a sparse linear regression over a network of agents, modeled as an undirected graph and no server node. We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm.
arXiv Detail & Related papers (2022-01-21T01:26:08Z)
Robust Estimation for Random Graphs [47.07886511972046]
We study the problem of robustly estimating the parameter $p$ of an ErdHos-R'enyi random graph on $n$ nodes. We give an inefficient algorithm with similar accuracy for all $gamma 1/2$, the information-theoretic limit.
arXiv Detail & Related papers (2021-11-09T18:43:25Z)
Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
We show that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model. For loss factors, we prove that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
arXiv Detail & Related papers (2020-09-18T02:18:44Z)
Convergence of Sparse Variational Inference in Gaussian Processes Regression [29.636483122130027]
We show that a method with an overall computational cost of $mathcalO(log N)2D(loglog N)2)$ can be used to perform inference.
arXiv Detail & Related papers (2020-08-01T19:23:34Z)
Differentially Quantized Gradient Methods [53.3186247068836]
We show that Differentially Quantized Gradient Descent (DQ-GD) attains a linear contraction factor of $maxsigma_mathrmGD, rhon 2-R$. No algorithm within a certain class can converge faster than $maxsigma_mathrmGD, 2-R$.
arXiv Detail & Related papers (2020-02-06T20:40:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.