Momentum Residual Neural Networks
- URL: http://arxiv.org/abs/2102.07870v1
- Date: Mon, 15 Feb 2021 22:24:52 GMT
- Title: Momentum Residual Neural Networks
- Authors: Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyr\'e
- Abstract summary: We propose to change the forward rule of a ResNet by adding a momentum term.
MomentumNets can be used as a drop-in replacement for any existing ResNet block.
We show that MomentumNets have the same accuracy as ResNets, while having a much smaller memory footprint.
- Score: 22.32840998053339
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The training of deep residual neural networks (ResNets) with backpropagation
has a memory cost that increases linearly with respect to the depth of the
network. A simple way to circumvent this issue is to use reversible
architectures. In this paper, we propose to change the forward rule of a ResNet
by adding a momentum term. The resulting networks, momentum residual neural
networks (MomentumNets), are invertible. Unlike previous invertible
architectures, they can be used as a drop-in replacement for any existing
ResNet block. We show that MomentumNets can be interpreted in the infinitesimal
step size regime as second-order ordinary differential equations (ODEs) and
exactly characterize how adding momentum progressively increases the
representation capabilities of MomentumNets. Our analysis reveals that
MomentumNets can learn any linear mapping up to a multiplicative factor, while
ResNets cannot. In a learning to optimize setting, where convergence to a fixed
point is required, we show theoretically and empirically that our method
succeeds while existing invertible architectures fail. We show on CIFAR and
ImageNet that MomentumNets have the same accuracy as ResNets, while having a
much smaller memory footprint, and show that pre-trained MomentumNets are
promising for fine-tuning models.
Related papers
- Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning [81.0108753452546]
We propose Dynamic Reversible Dual-Residual Networks, or Dr$2$Net, to finetune a pretrained model with substantially reduced memory consumption.
Dr$2$Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible.
We show that Dr$2$Net can reach comparable performance to conventional finetuning but with significantly less memory usage.
arXiv Detail & Related papers (2024-01-08T18:59:31Z) - Deep Learning without Shortcuts: Shaping the Kernel with Tailored
Rectifiers [83.74380713308605]
We develop a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs.
We show in experiments that our method, which introduces negligible extra computational cost, validation accuracies with deep vanilla networks that are competitive with ResNets.
arXiv Detail & Related papers (2022-03-15T17:49:08Z) - Singular Value Perturbation and Deep Network Optimization [29.204852309828006]
We develop new theoretical results on matrix perturbation to shed light on the impact of architecture on the performance of a deep network.
In particular, we explain what deep learning practitioners have long observed empirically: the parameters of some deep architectures are easier to optimize than others.
A direct application of our perturbation results explains analytically why a ResNet is easier to optimize than a ConvNet.
arXiv Detail & Related papers (2022-03-07T02:09:39Z) - Momentum Capsule Networks [0.8594140167290097]
We propose a new network architecture, called Momentum Capsule Network (MoCapsNet)
MoCapsNet is inspired by Momentum ResNets, a type of network that applies residual building blocks.
We show that MoCapsNet beats the accuracy of baseline capsule networks on MNIST, SVHN and CIFAR-10 while using considerably less memory.
arXiv Detail & Related papers (2022-01-26T17:53:18Z) - Hidden-Fold Networks: Random Recurrent Residuals Using Sparse Supermasks [1.0814638303152528]
Deep neural networks (DNNs) are so over-parametrized that recent research has found them to contain a subnetwork with high accuracy.
This paper proposes blending these lines of research into a highly compressed yet accurate model: Hidden-Fold Networks (HFNs)
It achieves equivalent performance to ResNet50 on CIFAR100 while occupying 38.5x less memory, and similar performance to ResNet34 on ImageNet with a memory size 26.8x smaller.
arXiv Detail & Related papers (2021-11-24T08:24:31Z) - Edge Rewiring Goes Neural: Boosting Network Resilience via Policy
Gradient [62.660451283548724]
ResiNet is a reinforcement learning framework to discover resilient network topologies against various disasters and attacks.
We show that ResiNet achieves a near-optimal resilience gain on multiple graphs while balancing the utility, with a large margin compared to existing approaches.
arXiv Detail & Related papers (2021-10-18T06:14:28Z) - m-RevNet: Deep Reversible Neural Networks with Momentum [25.609808975649624]
We propose a reversible neural network, termed as m-RevNet, that is characterized by inserting momentum update to residual blocks.
For certain learning scenarios, we analytically and empirically reveal that our m-RevNet succeeds while standard ResNet fails.
arXiv Detail & Related papers (2021-08-12T17:14:32Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Kernel-Based Smoothness Analysis of Residual Networks [85.20737467304994]
Residual networks (ResNets) stand out among these powerful modern architectures.
In this paper, we show another distinction between the two models, namely, a tendency of ResNets to promote smoothers than gradients.
arXiv Detail & Related papers (2020-09-21T16:32:04Z) - Do ideas have shape? Idea registration as the continuous limit of
artificial neural networks [0.609170287691728]
We show that ResNets converge, in the infinite depth limit, to a generalization of image registration variational algorithms.
We present the first rigorous proof of convergence of ResNets with trained weights and biases towards a Hamiltonian dynamics driven flow.
arXiv Detail & Related papers (2020-08-10T06:46:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.