Go Wide, Then Narrow: Efficient Training of Deep Thin Networks
- URL: http://arxiv.org/abs/2007.00811v2
- Date: Mon, 17 Aug 2020 17:43:30 GMT
- Title: Go Wide, Then Narrow: Efficient Training of Deep Thin Networks
- Authors: Denny Zhou, Mao Ye, Chen Chen, Tianjian Meng, Mingxing Tan, Xiaodan
Song, Quoc Le, Qiang Liu, and Dale Schuurmans
- Abstract summary: We propose an efficient method to train a deep thin network with a theoretic guarantee.
By training with our method, ResNet50 can outperform ResNet101, and BERT Base can be comparable with BERT Large.
- Score: 62.26044348366186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For deploying a deep learning model into production, it needs to be both
accurate and compact to meet the latency and memory constraints. This usually
results in a network that is deep (to ensure performance) and yet thin (to
improve computational efficiency). In this paper, we propose an efficient
method to train a deep thin network with a theoretic guarantee. Our method is
motivated by model compression. It consists of three stages. First, we
sufficiently widen the deep thin network and train it until convergence. Then,
we use this well-trained deep wide network to warm up (or initialize) the
original deep thin network. This is achieved by layerwise imitation, that is,
forcing the thin network to mimic the intermediate outputs of the wide network
from layer to layer. Finally, we further fine tune this already
well-initialized deep thin network. The theoretical guarantee is established by
using the neural mean field analysis. It demonstrates the advantage of our
layerwise imitation approach over backpropagation. We also conduct large-scale
empirical experiments to validate the proposed method. By training with our
method, ResNet50 can outperform ResNet101, and BERT Base can be comparable with
BERT Large, when ResNet101 and BERT Large are trained under the standard
training procedures as in the literature.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.