Reparameterization through Spatial Gradient Scaling
- URL: http://arxiv.org/abs/2303.02733v2
- Date: Tue, 7 Mar 2023 02:07:01 GMT
- Title: Reparameterization through Spatial Gradient Scaling
- Authors: Alexander Detkov, Mohammad Salameh, Muhammad Fetrat Qharabagh, Jialin
Zhang, Wei Lui, Shangling Jui, Di Niu
- Abstract summary: Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training.
We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
- Score: 69.27487006953852
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reparameterization aims to improve the generalization of deep neural networks
by transforming convolutional layers into equivalent multi-branched structures
during training. However, there exists a gap in understanding how
reparameterization may change and benefit the learning process of neural
networks. In this paper, we present a novel spatial gradient scaling method to
redistribute learning focus among weights in convolutional networks. We prove
that spatial gradient scaling achieves the same learning dynamics as a branched
reparameterization yet without introducing structural changes into the network.
We further propose an analytical approach that dynamically learns scalings for
each convolutional layer based on the spatial characteristics of its input
feature map gauged by mutual information. Experiments on CIFAR-10, CIFAR-100,
and ImageNet show that without searching for reparameterized structures, our
proposed scaling method outperforms the state-of-the-art reparameterization
strategies at a lower computational cost.
Related papers
- Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy [75.15685966213832]
We analyze the rich directional structure of optimization trajectories represented by their pointwise parameters.
We show that training only scalar batchnorm parameters some while into training matches the performance of training the entire network.
arXiv Detail & Related papers (2024-03-12T07:32:47Z) - Accelerated Training via Incrementally Growing Neural Networks using
Variance Transfer and Learning Rate Adaptation [34.7523496790944]
We develop an approach to efficiently grow neural networks, within which parameterization and optimization strategies are designed by considering the training dynamics.
We show that our method achieves comparable or better accuracy than training large fixed-size models, while saving a substantial portion of the original budget for training.
arXiv Detail & Related papers (2023-06-22T07:06:45Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Quiver neural networks [5.076419064097734]
We develop a uniform theoretical approach towards the analysis of various neural network connectivity architectures.
Inspired by quiver representation theory in mathematics, this approach gives a compact way to capture elaborate data flows.
arXiv Detail & Related papers (2022-07-26T09:42:45Z) - SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and
Residual Connections for Structure Preserving Object Classification [28.02302915971059]
In this paper, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task.
The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through auto-encoders.
To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset.
arXiv Detail & Related papers (2021-10-06T13:54:49Z) - The Impact of Reinitialization on Generalization in Convolutional Neural
Networks [3.462210753108297]
We study the impact of different reinitialization methods in several convolutional architectures across 12 benchmark image classification datasets.
We introduce a new layerwise reinitialization algorithm that outperforms previous methods.
Our takeaway message is that the accuracy of convolutional neural networks can be improved for small datasets using bottom-up layerwise reinitialization.
arXiv Detail & Related papers (2021-09-01T09:25:57Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Self-Reorganizing and Rejuvenating CNNs for Increasing Model Capacity
Utilization [8.661269034961679]
We propose a biologically inspired method for improving the computational resource utilization of neural networks.
The proposed method utilizes the channel activations of a convolution layer in order to reorganize that layers parameters.
The rejuvenated parameters learn different features to supplement those learned by the reorganized surviving parameters.
arXiv Detail & Related papers (2021-02-13T06:19:45Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.