PatchUp: A Regularization Technique for Convolutional Neural Networks
- URL: http://arxiv.org/abs/2006.07794v1
- Date: Sun, 14 Jun 2020 04:28:11 GMT
- Title: PatchUp: A Regularization Technique for Convolutional Neural Networks
- Authors: Mojtaba Faramarzi, Mohammad Amini, Akilesh Badrinaaraayanan, Vikas
Verma, and Sarath Chandar
- Abstract summary: We propose PatchUp, a hidden state block-level regularization technique for Convolutional Neural Networks (CNNs)
Our approach improves the robustness of CNN models against the manifold intrusion problem that may occur in other state-of-the-art mixing approaches like Mixup and CutMix.
We also show that PatchUp can provide better generalization to affine transformations of samples and is more robust against adversarial attacks.
- Score: 19.59198017238128
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large capacity deep learning models are often prone to a high generalization
gap when trained with a limited amount of labeled training data. A recent class
of methods to address this problem uses various ways to construct a new
training sample by mixing a pair (or more) of training samples. We propose
PatchUp, a hidden state block-level regularization technique for Convolutional
Neural Networks (CNNs), that is applied on selected contiguous blocks of
feature maps from a random pair of samples. Our approach improves the
robustness of CNN models against the manifold intrusion problem that may occur
in other state-of-the-art mixing approaches like Mixup and CutMix. Moreover,
since we are mixing the contiguous block of features in the hidden space, which
has more dimensions than the input space, we obtain more diverse samples for
training towards different dimensions. Our experiments on CIFAR-10, CIFAR-100,
and SVHN datasets with PreactResnet18, PreactResnet34, and WideResnet-28-10
models show that PatchUp improves upon, or equals, the performance of current
state-of-the-art regularizers for CNNs. We also show that PatchUp can provide
better generalization to affine transformations of samples and is more robust
against adversarial attacks.
Related papers
- MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning [1.534667887016089]
deep neural networks (DNNs) are vulnerable to slight adversarial perturbations.
We show that strong feature representation learning during training can significantly enhance the original model's robustness.
We propose MOREL, a multi-objective feature representation learning approach, encouraging classification models to produce similar features for inputs within the same class, despite perturbations.
arXiv Detail & Related papers (2024-10-02T16:05:03Z) - Granular-ball Representation Learning for Deep CNN on Learning with Label Noise [14.082510085545582]
We propose a general granular-ball computing (GBC) module that can be embedded into a CNN model.
In this study, we split the input samples as $gb$ samples at feature-level, each of which can correspond to multiple samples with varying numbers and share one single label.
Experiments demonstrate that the proposed method can improve the robustness of CNN models with no additional data or optimization.
arXiv Detail & Related papers (2024-09-05T05:18:31Z) - Phantom Embeddings: Using Embedding Space for Model Regularization in
Deep Neural Networks [12.293294756969477]
The strength of machine learning models stems from their ability to learn complex function approximations from data.
The complex models tend to memorize the training data, which results in poor regularization performance on test data.
We present a novel approach to regularize the models by leveraging the information-rich latent embeddings and their high intra-class correlation.
arXiv Detail & Related papers (2023-04-14T17:15:54Z) - Decoupled Mixup for Generalized Visual Recognition [71.13734761715472]
We propose a novel "Decoupled-Mixup" method to train CNN models for visual recognition.
Our method decouples each image into discriminative and noise-prone regions, and then heterogeneously combines these regions to train CNN models.
Experiment results show the high generalization performance of our method on testing data that are composed of unseen contexts.
arXiv Detail & Related papers (2022-10-26T15:21:39Z) - Learning Robust Kernel Ensembles with Kernel Average Pooling [3.6540368812166872]
We introduce Kernel Average Pooling (KAP), a neural network building block that applies the mean filter along the kernel dimension of the layer activation tensor.
We show that ensembles of kernels with similar functionality naturally emerge in convolutional neural networks equipped with KAP and trained with backpropagation.
arXiv Detail & Related papers (2022-09-30T19:49:14Z) - Two Heads are Better than One: Robust Learning Meets Multi-branch Models [14.72099568017039]
We propose Branch Orthogonality adveRsarial Training (BORT) to obtain state-of-the-art performance with solely the original dataset for adversarial training.
We evaluate our approach on CIFAR-10, CIFAR-100, and SVHN against ell_infty norm-bounded perturbations of size epsilon = 8/255, respectively.
arXiv Detail & Related papers (2022-08-17T05:42:59Z) - Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable.
In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols.
Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Bootstrap Your Object Detector via Mixed Training [82.98619147880397]
MixTraining is a new training paradigm for object detection that can improve the performance of existing detectors for free.
It enhances data augmentation by utilizing augmentations of different strengths while excluding the strong augmentations of certain training samples that may be detrimental to training.
MixTraining is found to bring consistent improvements across various detectors on the COCO dataset.
arXiv Detail & Related papers (2021-11-04T17:58:26Z) - KNN-BERT: Fine-Tuning Pre-Trained Models with KNN Classifier [61.063988689601416]
Pre-trained models are widely used in fine-tuning downstream tasks with linear classifiers optimized by the cross-entropy loss.
These problems can be improved by learning representations that focus on similarities in the same class and contradictions when making predictions.
We introduce the KNearest Neighbors in pre-trained model fine-tuning tasks in this paper.
arXiv Detail & Related papers (2021-10-06T06:17:05Z) - Embedding Propagation: Smoother Manifold for Few-Shot Classification [131.81692677836202]
We propose to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification.
We empirically show that embedding propagation yields a smoother embedding manifold.
We show that embedding propagation consistently improves the accuracy of the models in multiple semi-supervised learning scenarios by up to 16% points.
arXiv Detail & Related papers (2020-03-09T13:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.