Deep Isometric Learning for Visual Recognition
- URL: http://arxiv.org/abs/2006.16992v2
- Date: Sat, 15 Aug 2020 04:39:34 GMT
- Title: Deep Isometric Learning for Visual Recognition
- Authors: Haozhi Qi, Chong You, Xiaolong Wang, Yi Ma, Jitendra Malik
- Abstract summary: We show that deep vanilla ConvNets can be trained to achieve surprisingly good performance on standard image recognition benchmarks.
Our code is available at https://github.com/HaozhiQi/ISONet.
- Score: 67.94199891354157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Initialization, normalization, and skip connections are believed to be three
indispensable techniques for training very deep convolutional neural networks
and obtaining state-of-the-art performance. This paper shows that deep vanilla
ConvNets without normalization nor skip connections can also be trained to
achieve surprisingly good performance on standard image recognition benchmarks.
This is achieved by enforcing the convolution kernels to be near isometric
during initialization and training, as well as by using a variant of ReLU that
is shifted towards being isometric. Further experiments show that if combined
with skip connections, such near isometric networks can achieve performances on
par with (for ImageNet) and better than (for COCO) the standard ResNet, even
without normalization at all. Our code is available at
https://github.com/HaozhiQi/ISONet.
Related papers
- Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning [14.792099973449794]
We propose an algorithm to align the training dynamics of the sparse network with that of the dense one.
We show how the usually neglected data-dependent component in the NTK's spectrum can be taken into account.
Path eXclusion (PX) is able to find lottery tickets even at high sparsity levels.
arXiv Detail & Related papers (2024-06-03T22:19:42Z) - Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable.
In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols.
Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z) - Deep Learning without Shortcuts: Shaping the Kernel with Tailored
Rectifiers [83.74380713308605]
We develop a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs.
We show in experiments that our method, which introduces negligible extra computational cost, validation accuracies with deep vanilla networks that are competitive with ResNets.
arXiv Detail & Related papers (2022-03-15T17:49:08Z) - ZerO Initialization: Initializing Residual Networks with only Zeros and
Ones [44.66636787050788]
Deep neural networks are usually with random weights, with adequately selected initial variance to ensure stable signal propagation during training.
There is no consensus on how to select the variance, and this becomes challenging as the number of layers grows.
In this work, we replace the widely used random weight initialization with a fully deterministic initialization scheme ZerO, which initializes residual networks with only zeros and ones.
Surprisingly, we find that ZerO achieves state-of-the-art performance over various image classification datasets, including ImageNet.
arXiv Detail & Related papers (2021-10-25T06:17:33Z) - Rapid training of deep neural networks without skip connections or
normalization layers using Deep Kernel Shaping [46.083745557823164]
We identify the main pathologies present in deep networks that prevent them from training fast and generalizing to unseen data.
We show how these can be avoided by carefully controlling the "shape" of the network's kernel function.
arXiv Detail & Related papers (2021-10-05T00:49:36Z) - Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive
Benchmark Study [100.27567794045045]
Training deep graph neural networks (GNNs) is notoriously hard.
We present the first fair and reproducible benchmark dedicated to assessing the "tricks" of training deep GNNs.
arXiv Detail & Related papers (2021-08-24T05:00:37Z) - Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network
Optimization [16.85167651136133]
We take a broader view of training sparse networks and consider the role of regularization, optimization and architecture choices on sparse models.
We show that gradient flow in sparse networks can be improved by reconsidering aspects of the architecture design and the training regime.
arXiv Detail & Related papers (2021-02-02T18:40:26Z) - Go Wide, Then Narrow: Efficient Training of Deep Thin Networks [62.26044348366186]
We propose an efficient method to train a deep thin network with a theoretic guarantee.
By training with our method, ResNet50 can outperform ResNet101, and BERT Base can be comparable with BERT Large.
arXiv Detail & Related papers (2020-07-01T23:34:35Z) - Robust Pruning at Initialization [61.30574156442608]
A growing need for smaller, energy-efficient, neural networks to be able to use machine learning applications on devices with limited computational resources.
For Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned.
arXiv Detail & Related papers (2020-02-19T17:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.