MaxUp: A Simple Way to Improve Generalization of Neural Network Training
- URL: http://arxiv.org/abs/2002.09024v1
- Date: Thu, 20 Feb 2020 21:20:28 GMT
- Title: MaxUp: A Simple Way to Improve Generalization of Neural Network Training
- Authors: Chengyue Gong, Tongzheng Ren, Mao Ye, Qiang Liu
- Abstract summary: emphMaxUp is an embarrassingly simple, highly effective technique for improving the generalization performance of machine learning models.
In particular, we improve ImageNet classification from the state-of-the-art top-1 accuracy $85.5%$ without extra data to $85.8%$.
- Score: 41.89570630848936
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose \emph{MaxUp}, an embarrassingly simple, highly effective technique
for improving the generalization performance of machine learning models,
especially deep neural networks. The idea is to generate a set of augmented
data with some random perturbations or transforms and minimize the maximum, or
worst case loss over the augmented data. By doing so, we implicitly introduce a
smoothness or robustness regularization against the random perturbations, and
hence improve the generation performance. For example, in the case of Gaussian
perturbation,
\emph{MaxUp} is asymptotically equivalent to using the gradient norm of the
loss as a penalty to encourage smoothness. We test \emph{MaxUp} on a range of
tasks, including image classification, language modeling, and adversarial
certification, on which \emph{MaxUp} consistently outperforms the existing best
baseline methods, without introducing substantial computational overhead. In
particular, we improve ImageNet classification from the state-of-the-art top-1
accuracy $85.5\%$ without extra data to $85.8\%$. Code will be released soon.
Related papers
- Distinction Maximization Loss: Efficiently Improving Classification
Accuracy, Uncertainty Estimation, and Out-of-Distribution Detection Simply
Replacing the Loss and Calibrating [2.262407399039118]
We propose training deterministic deep neural networks using our DisMax loss.
DisMax usually outperforms all current approaches simultaneously in classification accuracy, uncertainty estimation, inference efficiency, and out-of-distribution detection.
arXiv Detail & Related papers (2022-05-12T04:37:35Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Adapting Stepsizes by Momentumized Gradients Improves Optimization and
Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z) - Reweighting Augmented Samples by Minimizing the Maximal Expected Loss [51.2791895511333]
We construct the maximal expected loss which is the supremum over any reweighted loss on augmented samples.
Inspired by adversarial training, we minimize this maximal expected loss and obtain a simple and interpretable closed-form solution.
The proposed method can generally be applied on top of any data augmentation methods.
arXiv Detail & Related papers (2021-03-16T09:31:04Z) - Dissecting Supervised Constrastive Learning [24.984074794337157]
Minimizing cross-entropy over the softmax scores of a linear map composed with a high-capacity encoder is arguably the most popular choice for training neural networks on supervised learning tasks.
We show that one can directly optimize the encoder instead, to obtain equally (or even more) discriminative representations via a supervised variant of a contrastive objective.
arXiv Detail & Related papers (2021-02-17T15:22:38Z) - Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in
Image Classification [46.885260723836865]
Deep convolutional neural networks (CNNs) generally improve when fueled with high resolution images.
Inspired by the fact that not all regions in an image are task-relevant, we propose a novel framework that performs efficient image classification.
Our framework is general and flexible as it is compatible with most of the state-of-the-art light-weighted CNNs.
arXiv Detail & Related papers (2020-10-11T17:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.