A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms
- URL: http://arxiv.org/abs/2211.12386v2
- Date: Thu, 6 Jul 2023 08:00:49 GMT
- Title: A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms
- Authors: Danimir T. Doncevic, Alexander Mitsos, Yue Guo, Qianxiao Li, Felix
Dietrich, Manuel Dahmen, Ioannis G. Kevrekidis
- Abstract summary: We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms.
We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
- Score: 64.3064050603721
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Meta-learning of numerical algorithms for a given task consists of the
data-driven identification and adaptation of an algorithmic structure and the
associated hyperparameters. To limit the complexity of the meta-learning
problem, neural architectures with a certain inductive bias towards favorable
algorithmic structures can, and should, be used. We generalize our previously
introduced Runge-Kutta neural network to a recursively recurrent neural network
(R2N2) superstructure for the design of customized iterative algorithms. In
contrast to off-the-shelf deep learning approaches, it features a distinct
division into modules for generation of information and for the subsequent
assembly of this information towards a solution. Local information in the form
of a subspace is generated by subordinate, inner, iterations of recurrent
function evaluations starting at the current outer iterate. The update to the
next outer iterate is computed as a linear combination of these evaluations,
reducing the residual in this space, and constitutes the output of the network.
We demonstrate that regular training of the weight parameters inside the
proposed superstructure on input/output data of various computational problem
classes yields iterations similar to Krylov solvers for linear equation
systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta
integrators for ordinary differential equations. Due to its modularity, the
superstructure can be readily extended with functionalities needed to represent
more general classes of iterative algorithms traditionally based on Taylor
series expansions.
Related papers
- NeuralEF: Deconstructing Kernels by Deep Neural Networks [47.54733625351363]
Traditional nonparametric solutions based on the Nystr"om formula suffer from scalability issues.
Recent work has resorted to a parametric approach, i.e., training neural networks to approximate the eigenfunctions.
We show that these problems can be fixed by using a new series of objective functions that generalizes to space of supervised and unsupervised learning problems.
arXiv Detail & Related papers (2022-04-30T05:31:07Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Connecting Weighted Automata, Tensor Networks and Recurrent Neural
Networks through Spectral Learning [58.14930566993063]
We present connections between three models used in different research fields: weighted finite automata(WFA) from formal languages and linguistics, recurrent neural networks used in machine learning, and tensor networks.
We introduce the first provable learning algorithm for linear 2-RNN defined over sequences of continuous vectors input.
arXiv Detail & Related papers (2020-10-19T15:28:00Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding
Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed.
An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed.
We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z) - A Lagrangian Approach to Information Propagation in Graph Neural
Networks [21.077268852378385]
In this paper, we propose a novel approach to the state computation and the learning algorithm for Graph Neural Network (GNN) models.
The state convergence procedure is implicitly expressed by the constraint satisfaction mechanism and does not require a separate iterative phase for each epoch of the learning procedure.
In fact, the computational structure is based on the search for saddle points of the Lagrangian in the adjoint space composed of weights, neural outputs (node states) and Lagrange multipliers.
arXiv Detail & Related papers (2020-02-18T16:13:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.