Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers
- URL: http://arxiv.org/abs/2102.04697v1
- Date: Tue, 9 Feb 2021 08:19:49 GMT
- Title: Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers
- Authors: Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter
Bell and Steve Renals
- Abstract summary: We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
- Score: 54.47911829539919
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although the lower layers of a deep neural network learn features which are
transferable across datasets, these layers are not transferable within the same
dataset. That is, in general, freezing the trained feature extractor (the lower
layers) and retraining the classifier (the upper layers) on the same dataset
leads to worse performance. In this paper, for the first time, we show that the
frozen classifier is transferable within the same dataset. We develop a novel
top-down training method which can be viewed as an algorithm for searching for
high-quality classifiers. We tested this method on automatic speech recognition
(ASR) tasks and language modelling tasks. The proposed method consistently
improves recurrent neural network ASR models on Wall Street Journal,
self-attention ASR models on Switchboard, and AWD-LSTM language models on
WikiText-2.
Related papers
- NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance [0.0]
We propose a zero-cost proxy Network Expressivity by Activation Rank (NEAR) to identify the optimal neural network without training.
We demonstrate the cutting-edge correlation between this network score and the model accuracy on NAS-Bench-101 and NATS-Bench-SSS/TSS.
arXiv Detail & Related papers (2024-08-16T14:38:14Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - Towards Disentangling Information Paths with Coded ResNeXt [11.884259630414515]
We take a novel approach to enhance the transparency of the function of the whole network.
We propose a neural network architecture for classification, in which the information that is relevant to each class flows through specific paths.
arXiv Detail & Related papers (2022-02-10T21:45:49Z) - Recurrent Stacking of Layers in Neural Networks: An Application to
Neural Machine Translation [18.782750537161615]
We propose to share parameters across all layers thereby leading to a recurrently stacked neural network model.
We empirically demonstrate that the translation quality of a model that recurrently stacks a single layer 6 times, despite having significantly fewer parameters, approaches that of a model that stacks 6 layers where each layer has different parameters.
arXiv Detail & Related papers (2021-06-18T08:48:01Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - Collaborative Method for Incremental Learning on Classification and
Generation [32.07222897378187]
We introduce a novel algorithm, Incremental Class Learning with Attribute Sharing (ICLAS), for incremental class learning with deep neural networks.
As one of its component, incGAN, can generate images with increased variety compared with the training data.
Under challenging environment of data deficiency, ICLAS incrementally trains classification and the generation networks.
arXiv Detail & Related papers (2020-10-29T06:34:53Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z) - An Effective and Efficient Initialization Scheme for Training
Multi-layer Feedforward Neural Networks [5.161531917413708]
We propose a novel network initialization scheme based on the celebrated Stein's identity.
A proposed SteinGLM method is shown through extensive numerical results to be much faster and more accurate than other popular methods commonly used for training neural networks.
arXiv Detail & Related papers (2020-05-16T16:17:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.