Related papers: Restarted contractive operators to learn at equilibrium

Restarted contractive operators to learn at equilibrium

URL: http://arxiv.org/abs/2506.13239v1
Date: Mon, 16 Jun 2025 08:38:56 GMT
Title: Restarted contractive operators to learn at equilibrium
Authors: Leo Davy, Luis M. Briceno-Arias, N. Pustelnik,
Abstract summary: We introduce an algorithm that combines a restart strategy with JFB computed by AD and we show that the learned steps can be made arbitrarily close to the optimal DEQ framework.<n>We show that this method is effective for training weights in weighted norms; stepsizes and regularization levels of Plug-and-Play schemes; and a DRUNet denoiser embedded in Forward-Backward iterates.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bilevel optimization offers a methodology to learn hyperparameters in imaging inverse problems, yet its integration with automatic differentiation techniques remains challenging. On the one hand, inverse problems are typically solved by iterating arbitrarily many times some elementary scheme which maps any point to the minimizer of an energy functional, known as equilibrium point. On the other hand, introducing parameters to be learned in the energy functional yield architectures very reminiscent of Neural Networks (NN) known as Unrolled NN and thus suggests the use of Automatic Differentiation (AD) techniques. Yet, applying AD requires for the NN to be of relatively small depth, thus making necessary to truncate an unrolled scheme to a finite number of iterations. First, we show that, at the minimizer, the optimal gradient descent step computed in the Deep Equilibrium (DEQ) framework admits an approximation, known as Jacobian Free Backpropagation (JFB), that is much easier to compute and can be made arbitrarily good by controlling Lipschitz properties of the truncated unrolled scheme. Second, we introduce an algorithm that combines a restart strategy with JFB computed by AD and we show that the learned steps can be made arbitrarily close to the optimal DEQ framework. Third, we complement the theoretical analysis by applying the proposed method to a variety of problems in imaging that progressively depart from the theoretical framework. In particular we show that this method is effective for training weights in weighted norms; stepsizes and regularization levels of Plug-and-Play schemes; and a DRUNet denoiser embedded in Forward-Backward iterates.

Related papers

Learning based convex approximation for constrained parametric optimization [11.379408842026981]
We propose an input neural network (ICNN)-based self-supervised learning framework to solve constrained optimization problems.<n>We provide rigorous convergence analysis, showing that the framework converges to a Karush-Kuhn-Tucker (KKT) approximation point of the original problem.<n>Our approach achieves a superior balance among accuracy, feasibility, and computational efficiency.
arXiv Detail & Related papers (2025-05-07T00:33:14Z)
Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We propose a new family of algorithms that uses the linear minimization oracle (LMO) to adapt to the geometry of the problem.<n>We demonstrate significant speedups on nanoGPT training using our algorithm, Scion, without any reliance on Adam.
arXiv Detail & Related papers (2025-02-11T13:10:34Z)
Local Linear Convergence of Infeasible Optimization with Orthogonal Constraints [12.414718831844041]
An infeasible retraction-based approach was proposed as an efficient alternative.<n>This paper establishes a novel landing algorithm for smooth non-free component analysis using only a neuralian PL condition.<n> Numerical experiments demonstrate that the landing algorithm performs on par with the state-the-art retraction-based methods with substantially reduced computational overhead.
arXiv Detail & Related papers (2024-12-07T16:02:27Z)
A Sample Efficient Alternating Minimization-based Algorithm For Robust Phase Retrieval [56.67706781191521]
In this work, we present a robust phase retrieval problem where the task is to recover an unknown signal. Our proposed oracle avoids the need for computationally spectral descent, using a simple gradient step and outliers.
arXiv Detail & Related papers (2024-09-07T06:37:23Z)
Alternating Minimization Schemes for Computing Rate-Distortion-Perception Functions with $f$-Divergence Perception Constraints [10.564071872770146]
We study the computation of the rate-distortion-perception function (RDPF) for discrete memoryless sources. We characterize the optimal parametric solutions. We provide sufficient conditions on the distortion and the perception constraints.
arXiv Detail & Related papers (2024-08-27T12:50:12Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Iterative Reweighted Least Squares Networks With Convergence Guarantees for Solving Inverse Imaging Problems [12.487990897680422]
We present a novel optimization strategy for image reconstruction tasks under analysis-based image regularization. We parameterize such regularizers using potential functions that correspond to weighted extensions of the $ell_pp$-vector and $mathcalS_pp$ Schatten-matrix quasi-norms. We show that thanks to the convergence guarantees of our proposed minimization strategy, such optimization can be successfully performed with a memory-efficient implicit back-propagation scheme.
arXiv Detail & Related papers (2023-08-10T17:59:46Z)
A Deep Unrolling Model with Hybrid Optimization Structure for Hyperspectral Image Deconvolution [50.13564338607482]
We propose a novel optimization framework for the hyperspectral deconvolution problem, called DeepMix.<n>It consists of three distinct modules, namely, a data consistency module, a module that enforces the effect of the handcrafted regularizers, and a denoising module.<n>This work proposes a context aware denoising module designed to sustain the advancements achieved by the cooperative efforts of the other modules.
arXiv Detail & Related papers (2023-06-10T08:25:16Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
AskewSGD : An Annealed interval-constrained Optimisation method to train Quantized Neural Networks [12.229154524476405]
We develop a new algorithm, Annealed Skewed SGD - AskewSGD - for training deep neural networks (DNNs) with quantized weights. Unlike algorithms with active sets and feasible directions, AskewSGD avoids projections or optimization under the entire feasible set. Experimental results show that the AskewSGD algorithm performs better than or on par with state of the art methods in classical benchmarks.
arXiv Detail & Related papers (2022-11-07T18:13:44Z)
Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
Activation Relaxation: A Local Dynamical Approximation to Backpropagation in the Brain [62.997667081978825]
Activation Relaxation (AR) is motivated by constructing the backpropagation gradient as the equilibrium point of a dynamical system. Our algorithm converges rapidly and robustly to the correct backpropagation gradients, requires only a single type of computational unit, and can operate on arbitrary computation graphs.
arXiv Detail & Related papers (2020-09-11T11:56:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.