Related papers: Hamiltonian Deep Neural Networks Guaranteeing Non-vanishing Gradients by Design

Hamiltonian Deep Neural Networks Guaranteeing Non-vanishing Gradients by Design

URL: http://arxiv.org/abs/2105.13205v1
Date: Thu, 27 May 2021 14:52:22 GMT
Title: Hamiltonian Deep Neural Networks Guaranteeing Non-vanishing Gradients by Design
Authors: Clara Luc\'ia Galimberti, Luca Furieri, Liang Xu, Giancarlo Ferrari-Trecate
Abstract summary: Vanishing and exploding gradients during weight optimization through backpropagation can be difficult to train. We propose a general class of Hamiltonian DNNs (H-DNNs) that stem from the discretization of continuous-time Hamiltonian systems. Our main result is that a broad set of H-DNNs ensures non-vanishing gradients by design for an arbitrary network depth. The good performance of H-DNNs is demonstrated on benchmark classification problems, including image classification with the MNIST dataset.
Score: 2.752441514346229
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Neural Networks (DNNs) training can be difficult due to vanishing and exploding gradients during weight optimization through backpropagation. To address this problem, we propose a general class of Hamiltonian DNNs (H-DNNs) that stem from the discretization of continuous-time Hamiltonian systems and include several existing architectures based on ordinary differential equations. Our main result is that a broad set of H-DNNs ensures non-vanishing gradients by design for an arbitrary network depth. This is obtained by proving that, using a semi-implicit Euler discretization scheme, the backward sensitivity matrices involved in gradient computations are symplectic. We also provide an upper bound to the magnitude of sensitivity matrices, and show that exploding gradients can be either controlled through regularization or avoided for special architectures. Finally, we enable distributed implementations of backward and forward propagation algorithms in H-DNNs by characterizing appropriate sparsity constraints on the weight matrices. The good performance of H-DNNs is demonstrated on benchmark classification problems, including image classification with the MNIST dataset.

Related papers

Matrix Completion via Nonsmooth Regularization of Fully Connected Neural Networks [7.349727826230864]
It has been shown that enhanced performance could be attained by using nonlinear estimators such as deep neural networks. In this paper, we control over-fitting by regularizing FCNN model in terms of norm intermediate representations. Our simulations indicate the superiority of the proposed algorithm in comparison with existing linear and nonlinear algorithms.
arXiv Detail & Related papers (2024-03-15T12:00:37Z)
Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium Models [16.07760622196666]
We study the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers. Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process. Remarkably, this convergence holds even when the limits of depth and width are interchanged.
arXiv Detail & Related papers (2023-10-16T19:00:43Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
Physics-Informed Machine Learning of Dynamical Systems for Efficient Bayesian Inference [0.0]
No-u-turn sampler (NUTS) is a widely adopted method for performing Bayesian inference. Hamiltonian neural networks (HNNs) are a noteworthy architecture. We propose the use of HNNs for performing Bayesian inference efficiently without requiring numerous posterior gradients.
arXiv Detail & Related papers (2022-09-19T21:17:23Z)
Equivariant Hypergraph Diffusion Neural Operators [81.32770440890303]
Hypergraph neural networks (HNNs) using neural networks to encode hypergraphs provide a promising way to model higher-order relations in data. This work proposes a new HNN architecture named ED-HNN, which provably represents any continuous equivariant hypergraph diffusion operators. We evaluate ED-HNN for node classification on nine real-world hypergraph datasets.
arXiv Detail & Related papers (2022-07-14T06:17:00Z)
Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training. We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z)
Free Hyperbolic Neural Networks with Limited Radii [32.42488915688723]
Hyperbolic Neural Networks (HNNs) that operate directly in hyperbolic space have been proposed recently to further exploit the potential of hyperbolic representations. While HNNs have achieved better performance than Euclidean neural networks (ENNs) on datasets with implicit hierarchical structure, they still perform poorly on standard classification benchmarks such as CIFAR and ImageNet. In this paper, we first conduct an empirical study showing that the inferior performance of HNNs on standard recognition datasets can be attributed to the notorious vanishing gradient problem. Our analysis leads to a simple yet effective solution called Feature Clipping, which regularizes the hyperbolic embedding whenever its
arXiv Detail & Related papers (2021-07-23T22:10:16Z)
A unified framework for Hamiltonian deep neural networks [3.0934684265555052]
Training deep neural networks (DNNs) can be difficult due to vanishing/exploding gradients during weight optimization. We propose a class of DNNs stemming from the time discretization of Hamiltonian systems. The proposed Hamiltonian framework, besides encompassing existing networks inspired by marginally stable ODEs, allows one to derive new and more expressive architectures.
arXiv Detail & Related papers (2021-04-27T13:20:24Z)
Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role. We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z)
Multipole Graph Neural Operator for Parametric Partial Differential Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data. We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
Binarized Graph Neural Network [65.20589262811677]
We develop a binarized graph neural network to learn the binary representations of the nodes with binary network parameters. Our proposed method can be seamlessly integrated into the existing GNN-based embedding approaches. Experiments indicate that the proposed binarized graph neural network, namely BGN, is orders of magnitude more efficient in terms of both time and space.
arXiv Detail & Related papers (2020-04-19T09:43:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.