Concurrent Adversarial Learning for Large-Batch Training
- URL: http://arxiv.org/abs/2106.00221v1
- Date: Tue, 1 Jun 2021 04:26:02 GMT
- Title: Concurrent Adversarial Learning for Large-Batch Training
- Authors: Yong Liu, Xiangning Chen, Minhao Cheng, Cho-Jui Hsieh, Yang You
- Abstract summary: Adversarial learning is a natural choice for smoothing the decision surface and biasing towards a flat region.
We propose a novel Concurrent Adversarial Learning (ConAdv) method that decouples the sequential gradient computations in adversarial learning by utilizing staled parameters.
This is the first work successfully scales ResNet-50 training batch size to 96K.
- Score: 83.55868483681748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-batch training has become a commonly used technique when training
neural networks with a large number of GPU/TPU processors. As batch size
increases, stochastic optimizers tend to converge to sharp local minima,
leading to degraded test performance. Current methods usually use extensive
data augmentation to increase the batch size, but we found the performance gain
with data augmentation decreases as batch size increases, and data augmentation
will become insufficient after certain point. In this paper, we propose to use
adversarial learning to increase the batch size in large-batch training.
Despite being a natural choice for smoothing the decision surface and biasing
towards a flat region, adversarial learning has not been successfully applied
in large-batch training since it requires at least two sequential gradient
computations at each step, which will at least double the running time compared
with vanilla training even with a large number of processors. To overcome this
issue, we propose a novel Concurrent Adversarial Learning (ConAdv) method that
decouple the sequential gradient computations in adversarial learning by
utilizing staled parameters. Experimental results demonstrate that ConAdv can
successfully increase the batch size on both ResNet-50 and EfficientNet
training on ImageNet while maintaining high accuracy. In particular, we show
ConAdv along can achieve 75.3\% top-1 accuracy on ImageNet ResNet-50 training
with 96K batch size, and the accuracy can be further improved to 76.2\% when
combining ConAdv with data augmentation. This is the first work successfully
scales ResNet-50 training batch size to 96K.
Related papers
- Can we learn better with hard samples? [0.0]
A variant of the traditional algorithm has been proposed, which trains the network focusing on mini-batches with high loss.
We show that the proposed method generalizes in 26.47% less number of epochs than the traditional mini-batch method in EfficientNet-B4 on STL-10.
arXiv Detail & Related papers (2023-04-07T05:45:26Z) - Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch
Size [58.762959061522736]
We show that scaling mini-batch sizes with appropriate learning rate adjustments can speed up the training process by orders of magnitude.
We show that scaling the mini-batch size and naively adjusting the learning rate allows for (1) a reduced size of the Q-ensemble, (2) stronger penalization of out-of-distribution actions, and (3) improved convergence time.
arXiv Detail & Related papers (2022-11-20T21:48:25Z) - Efficient and Effective Augmentation Strategy for Adversarial Training [48.735220353660324]
Adversarial training of Deep Neural Networks is known to be significantly more data-hungry than standard training.
We propose Diverse Augmentation-based Joint Adversarial Training (DAJAT) to use data augmentations effectively in adversarial training.
arXiv Detail & Related papers (2022-10-27T10:59:55Z) - EfficientNetV2: Smaller Models and Faster Training [91.77432224225221]
This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models.
We use a combination of training-aware neural architecture search and scaling, to jointly optimize training speed and parameter efficiency.
Our experiments show that EfficientNetV2 models train much faster than state-of-the-art models while being up to 6.8x smaller.
arXiv Detail & Related papers (2021-04-01T07:08:36Z) - Pruning Convolutional Filters using Batch Bridgeout [14.677724755838556]
State-of-the-art computer vision models are rapidly increasing in capacity, where the number of parameters far exceeds the number required to fit the training set.
This results in better optimization and generalization performance.
In order to reduce inference costs, convolutional filters in trained neural networks could be pruned to reduce the run-time memory and computational requirements during inference.
We propose the use of Batch Bridgeout, a sparsity inducing regularization scheme, to train neural networks so that they could be pruned efficiently with minimal degradation in performance.
arXiv Detail & Related papers (2020-09-23T01:51:47Z) - Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes [9.213729275749452]
We propose an accelerated gradient method called LANS to improve the efficiency of using large mini-batches for training.
It takes 54 minutes on 192 AWS EC2 P3dn.24xlarge instances to achieve a target F1 score of 90.5 or higher on SQuAD v1.1, achieving the fastest BERT training time in the cloud.
arXiv Detail & Related papers (2020-06-24T05:00:41Z) - The Limit of the Batch Size [79.8857712299211]
Large-batch training is an efficient approach for current distributed deep learning systems.
In this paper, we focus on studying the limit of the batch size.
We provide detailed numerical optimization instructions for step-by-step comparison.
arXiv Detail & Related papers (2020-06-15T16:18:05Z) - Scalable and Practical Natural Gradient for Large-Scale Deep Learning [19.220930193896404]
SP-NGD scales to large mini-batch sizes with a negligible computational overhead as compared to first-order methods.
We demonstrate convergence to a top-1 validation accuracy of 75.4% in 5.5 minutes using a mini-batch size of 32,768 with 1,024 GPUs, as well as an accuracy of 74.9% with an extremely large mini-batch size of 131,072 in 873 steps of SP-NGD.
arXiv Detail & Related papers (2020-02-13T11:55:37Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.