Differentially private training of residual networks with scale
normalisation
- URL: http://arxiv.org/abs/2203.00324v1
- Date: Tue, 1 Mar 2022 09:56:55 GMT
- Title: Differentially private training of residual networks with scale
normalisation
- Authors: Helena Klause, Alexander Ziller, Daniel Rueckert, Kerstin Hammernik,
Georgios Kaissis
- Abstract summary: We investigate the optimal choice of replacement layer for Batch Normalisation (BN) in residual networks (ResNets)
We study the phenomenon of scale mixing in residual blocks, whereby the activations on the two branches are scaled differently.
- Score: 64.60453677988517
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate the optimal choice of replacement layer for Batch
Normalisation (BN) in residual networks (ResNets) for training with
Differentially Private Stochastic Gradient Descent (DP-SGD) and study the
phenomenon of scale mixing in residual blocks, whereby the activations on the
two branches are scaled differently. Our experimental evaluation indicates that
a hyperparameter search over 1-64 Group Normalisation (GN) groups improves the
accuracy of ResNet-9 and ResNet-50 considerably in both benchmark (CIFAR-10)
and large-image (ImageNette) tasks. Moreover, Scale Normalisation, a simple
modification to the model architecture by which an additional normalisation
layer is introduced after the residual block's addition operation further
improves the utility of ResNets allowing us to achieve state-of-the-art results
on CIFAR-10.
Related papers
- FGGP: Fixed-Rate Gradient-First Gradual Pruning [2.0940682212182975]
We introduce a gradient-first magnitude-next strategy for choosing the parameters to prune, and show that a fixed-rate subselection criterion between these steps works better.
Our proposed fixed-rate gradient-first gradual pruning (FGGP) approach outperforms its state-of-the-art alternatives in most of the above experimental settings.
arXiv Detail & Related papers (2024-11-08T12:02:25Z) - Concurrent Training and Layer Pruning of Deep Neural Networks [0.0]
We propose an algorithm capable of identifying and eliminating irrelevant layers of a neural network during the early stages of training.
We employ a structure using residual connections around nonlinear network sections that allow the flow of information through the network once a nonlinear section is pruned.
arXiv Detail & Related papers (2024-06-06T23:19:57Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - WLD-Reg: A Data-dependent Within-layer Diversity Regularizer [98.78384185493624]
Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization.
We propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer.
We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks.
arXiv Detail & Related papers (2023-01-03T20:57:22Z) - Pruning Neural Networks with Interpolative Decompositions [5.377278489623063]
We introduce a principled approach to neural network pruning that casts the problem as a structured low-rank matrix approximation.
We demonstrate how to prune a neural network by first building a set of primitives to prune a single fully connected or convolution layer.
We achieve an accuracy of 93.62 $pm$ 0.36% using VGG-16 on CIFAR-10, with a 51% FLOPS reduction.
arXiv Detail & Related papers (2021-07-30T20:13:49Z) - Progressively Guided Alternate Refinement Network for RGB-D Salient
Object Detection [63.18846475183332]
We aim to develop an efficient and compact deep network for RGB-D salient object detection.
We propose a progressively guided alternate refinement network to refine it.
Our model outperforms existing state-of-the-art approaches by a large margin.
arXiv Detail & Related papers (2020-08-17T02:55:06Z) - Convolutional Neural Network Training with Distributed K-FAC [14.2773046188145]
Kronecker-factored Approximate Curvature (K-FAC) was recently proposed as an approximation of the Fisher Information Matrix.
We investigate here a scalable K-FAC design and its applicability in convolutional neural network (CNN) training at scale.
arXiv Detail & Related papers (2020-07-01T22:00:53Z) - Optimization Theory for ReLU Neural Networks Trained with Normalization
Layers [82.61117235807606]
The success of deep neural networks in part due to the use of normalization layers.
Our analysis shows how the introduction of normalization changes the landscape and can enable faster activation.
arXiv Detail & Related papers (2020-06-11T23:55:54Z) - Iterative Network for Image Super-Resolution [69.07361550998318]
Single image super-resolution (SISR) has been greatly revitalized by the recent development of convolutional neural networks (CNN)
This paper provides a new insight on conventional SISR algorithm, and proposes a substantially different approach relying on the iterative optimization.
A novel iterative super-resolution network (ISRN) is proposed on top of the iterative optimization.
arXiv Detail & Related papers (2020-05-20T11:11:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.