Training Recurrent Neural Networks by Sequential Least Squares and the
Alternating Direction Method of Multipliers
- URL: http://arxiv.org/abs/2112.15348v1
- Date: Fri, 31 Dec 2021 08:43:04 GMT
- Title: Training Recurrent Neural Networks by Sequential Least Squares and the
Alternating Direction Method of Multipliers
- Authors: Alberto Bemporad
- Abstract summary: We propose the use of convex and twice-differentiable loss and regularization terms for determining optimal hidden network parameters.
We combine sequential least squares with alternating direction multipliers.
The performance of the algorithm is tested in a nonlinear system identification benchmark.
- Score: 0.20305676256390928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For training recurrent neural network models of nonlinear dynamical systems
from an input/output training dataset based on rather arbitrary convex and
twice-differentiable loss functions and regularization terms, we propose the
use of sequential least squares for determining the optimal network parameters
and hidden states. In addition, to handle non-smooth regularization terms such
as L1, L0, and group-Lasso regularizers, as well as to impose possibly
non-convex constraints such as integer and mixed-integer constraints, we
combine sequential least squares with the alternating direction method of
multipliers (ADMM). The performance of the resulting algorithm, that we call
NAILS (Nonconvex ADMM Iterations and Least Squares), is tested in a nonlinear
system identification benchmark.
Related papers
- Precise asymptotics of reweighted least-squares algorithms for linear diagonal networks [15.074950361970194]
We provide a unified analysis for a family of algorithms that encompasses IRLS, the recently proposed linlin-RFM algorithm, and the alternating diagonal neural networks.
We show that, with appropriately chosen reweighting policy, a handful of sparse structures can achieve favorable performance.
We also show that leveraging this in the reweighting scheme provably improves test error compared to coordinate-wise reweighting.
arXiv Detail & Related papers (2024-06-04T20:37:17Z) - Efficient model predictive control for nonlinear systems modelled by deep neural networks [6.5268245109828005]
This paper presents a model predictive control (MPC) for dynamic systems whose nonlinearity and uncertainty are modelled by deep neural networks (NNs)
Since the NN output contains a high-order complex nonlinearity of the system state and control input, the MPC problem is nonlinear and challenging to solve for real-time control.
arXiv Detail & Related papers (2024-05-16T18:05:18Z) - An L-BFGS-B approach for linear and nonlinear system identification under $\ell_1$- and group-Lasso regularization [0.0]
We propose a very efficient numerical method for identifying linear and nonlinear discrete-time state-space models.
A Python implementation of the proposed identification method is available in the package jax-sysid.
arXiv Detail & Related papers (2024-03-06T16:17:34Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Recurrent Neural Network Training with Convex Loss and Regularization
Functions by Extended Kalman Filtering [0.20305676256390928]
We show that the learning method outperforms gradient descent in a nonlinear system identification benchmark.
We also explore the use of the algorithm in data-driven nonlinear model predictive control and its relation with disturbance models for offset-free tracking.
arXiv Detail & Related papers (2021-11-04T07:49:15Z) - Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via
GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer.
In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph.
Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z) - Least-Squares ReLU Neural Network (LSNN) Method For Linear
Advection-Reaction Equation [3.6525914200522656]
This paper studies least-squares ReLU neural network method for solving the linear advection-reaction problem with discontinuous solution.
The method is capable of approximating the discontinuous interface of the underlying problem automatically through the free hyper-planes of the ReLU neural network.
arXiv Detail & Related papers (2021-05-25T03:13:15Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Controllable Orthogonalization in Training DNNs [96.1365404059924]
Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1.
This paper proposes a computationally efficient and numerically stable orthogonalization method using Newton's iteration (ONI)
We show that our method improves the performance of image classification networks by effectively controlling the orthogonality to provide an optimal tradeoff between optimization benefits and representational capacity reduction.
We also show that ONI stabilizes the training of generative adversarial networks (GANs) by maintaining the Lipschitz continuity of a network, similar to spectral normalization (
arXiv Detail & Related papers (2020-04-02T10:14:27Z) - Neural Networks are Convex Regularizers: Exact Polynomial-time Convex
Optimization Formulations for Two-layer Networks [70.15611146583068]
We develop exact representations of training two-layer neural networks with rectified linear units (ReLUs)
Our theory utilizes semi-infinite duality and minimum norm regularization.
arXiv Detail & Related papers (2020-02-24T21:32:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.