Adjoined Networks: A Training Paradigm with Applications to Network
Compression
- URL: http://arxiv.org/abs/2006.05624v5
- Date: Fri, 15 Apr 2022 00:15:28 GMT
- Title: Adjoined Networks: A Training Paradigm with Applications to Network
Compression
- Authors: Utkarsh Nath, Shrinu Kushagra, Yingzhen Yang
- Abstract summary: We introduce Adjoined Networks, or AN, a learning paradigm that trains both the original base network and the smaller compressed network together.
Using ResNet-50 as the base network, AN achieves 71.8% top-1 accuracy with only 1.8M parameters and 1.6 GFLOPs on the ImageNet data-set.
We propose Differentiable Adjoined Networks (DAN), a training paradigm that augments AN by using neural architecture search to jointly learn both the width and the weights for each layer of the smaller network.
- Score: 3.995047443480282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compressing deep neural networks while maintaining accuracy is important when
we want to deploy large, powerful models in production and/or edge devices. One
common technique used to achieve this goal is knowledge distillation.
Typically, the output of a static pre-defined teacher (a large base network) is
used as soft labels to train and transfer information to a student (or smaller)
network. In this paper, we introduce Adjoined Networks, or AN, a learning
paradigm that trains both the original base network and the smaller compressed
network together. In our training approach, the parameters of the smaller
network are shared across both the base and the compressed networks. Using our
training paradigm, we can simultaneously compress (the student network) and
regularize (the teacher network) any architecture. In this paper, we focus on
popular CNN-based architectures used for computer vision tasks. We conduct an
extensive experimental evaluation of our training paradigm on various
large-scale datasets. Using ResNet-50 as the base network, AN achieves 71.8%
top-1 accuracy with only 1.8M parameters and 1.6 GFLOPs on the ImageNet
data-set. We further propose Differentiable Adjoined Networks (DAN), a training
paradigm that augments AN by using neural architecture search to jointly learn
both the width and the weights for each layer of the smaller network. DAN
achieves ResNet-50 level accuracy on ImageNet with $3.8\times$ fewer parameters
and $2.2\times$ fewer FLOPs.
Related papers
- Convolutional Networks as Extremely Small Foundation Models: Visual Prompting and Theoretical Perspective [1.79487674052027]
In this paper, we design a prompting module which performs few-shot adaptation of generic deep networks to new tasks.
Driven by learning theory, we derive prompting modules that are as simple as possible, as they generalize better under the same training error.
In practice, SDForest has extremely low cost and achieves real-time even on CPU.
arXiv Detail & Related papers (2024-09-03T12:34:23Z) - Connection Reduction Is All You Need [0.10878040851637998]
Empirical research shows that simply stacking convolutional layers does not make the network train better.
We propose two new algorithms to connect layers.
ShortNet1 has a 5% lower test error rate and 25% faster inference time than Baseline.
arXiv Detail & Related papers (2022-08-02T13:00:35Z) - An Experimental Study of the Impact of Pre-training on the Pruning of a
Convolutional Neural Network [0.0]
In recent years, deep neural networks have known a wide success in various application domains.
Deep neural networks usually involve a large number of parameters, which correspond to the weights of the network.
The pruning methods notably attempt to reduce the size of the parameter set, by identifying and removing the irrelevant weights.
arXiv Detail & Related papers (2021-12-15T16:02:15Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Efficient Transfer Learning via Joint Adaptation of Network Architecture
and Weight [66.8543732597723]
Recent worksin neural architecture search (NAS) can aid transfer learning by establishing sufficient network search space.
We propose a novel framework consisting of two modules, the neural architecturesearch module for architecture transfer and the neural weight search module for weight transfer.
These two modules conduct search on thetarget task based on a reduced super-networks, so we only need to trainonce on the source task.
arXiv Detail & Related papers (2021-05-19T08:58:04Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Principal Component Networks: Parameter Reduction Early in Training [10.14522349959932]
We show how to find small networks that exhibit the same performance as their over parameterized counterparts.
We use PCA to find a basis of high variance for layer inputs and represent layer weights using these directions.
We also show that ResNet-20 PCNs outperform deep ResNet-110 networks while training faster.
arXiv Detail & Related papers (2020-06-23T21:40:24Z) - Improved Residual Networks for Image and Video Recognition [98.10703825716142]
Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture.
We show consistent improvements in accuracy and learning convergence over the baseline.
Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues.
arXiv Detail & Related papers (2020-04-10T11:09:50Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.