Stochastic Batch Augmentation with An Effective Distilled Dynamic Soft
Label Regularizer
- URL: http://arxiv.org/abs/2006.15284v1
- Date: Sat, 27 Jun 2020 04:46:39 GMT
- Title: Stochastic Batch Augmentation with An Effective Distilled Dynamic Soft
Label Regularizer
- Authors: Qian Li, Qingyuan Hu, Yong Qi, Saiyu Qi, Jie Ma, and Jian Zhang
- Abstract summary: We propose a framework called Batch Augmentation safety of generalization (SBA) to address these problems.
SBA decides whether to augment at iterations controlled by the batch scheduler and in which a ''distilled'' dynamic soft regularization is introduced.
Our experiments on CIFAR-10, CIFAR-100, and ImageNet show that SBA can improve the generalization of the neural networks and speed up the convergence of network training.
- Score: 11.153892464618545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation have been intensively used in training deep neural network
to improve the generalization, whether in original space (e.g., image space) or
representation space. Although being successful, the connection between the
synthesized data and the original data is largely ignored in training, without
considering the distribution information that the synthesized samples are
surrounding the original sample in training. Hence, the behavior of the network
is not optimized for this. However, that behavior is crucially important for
generalization, even in the adversarial setting, for the safety of the deep
learning system. In this work, we propose a framework called Stochastic Batch
Augmentation (SBA) to address these problems. SBA stochastically decides
whether to augment at iterations controlled by the batch scheduler and in which
a ''distilled'' dynamic soft label regularization is introduced by
incorporating the similarity in the vicinity distribution respect to raw
samples. The proposed regularization provides direct supervision by the
KL-Divergence between the output soft-max distributions of original and virtual
data. Our experiments on CIFAR-10, CIFAR-100, and ImageNet show that SBA can
improve the generalization of the neural networks and speed up the convergence
of network training.
Related papers
- Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - PDE+: Enhancing Generalization via PDE with Adaptive Distributional
Diffusion [66.95761172711073]
generalization of neural networks is a central challenge in machine learning.
We propose to enhance it directly through the underlying function of neural networks, rather than focusing on adjusting input data.
We put this theoretical framework into practice as $textbfPDE+$ ($textbfPDE$ with $textbfA$daptive $textbfD$istributional $textbfD$iffusion)
arXiv Detail & Related papers (2023-05-25T08:23:26Z) - Efficient Augmentation for Imbalanced Deep Learning [8.38844520504124]
We study a convolutional neural network's internal representation of imbalanced image data.
We measure the generalization gap between a model's feature embeddings in the training and test sets, showing that the gap is wider for minority classes.
This insight enables us to design an efficient three-phase CNN training framework for imbalanced data.
arXiv Detail & Related papers (2022-07-13T09:43:17Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Federated Dynamic Sparse Training: Computing Less, Communicating Less,
Yet Learning Better [88.28293442298015]
Federated learning (FL) enables distribution of machine learning workloads from the cloud to resource-limited edge devices.
We develop, implement, and experimentally validate a novel FL framework termed Federated Dynamic Sparse Training (FedDST)
FedDST is a dynamic process that extracts and trains sparse sub-networks from the target full network.
arXiv Detail & Related papers (2021-12-18T02:26:38Z) - Fine-grained Data Distribution Alignment for Post-Training Quantization [100.82928284439271]
We propose a fine-grained data distribution alignment (FDDA) method to boost the performance of post-training quantization.
Our method shows the state-of-the-art performance on ImageNet, especially when the first and last layers are quantized to low-bit.
arXiv Detail & Related papers (2021-09-09T11:45:52Z) - Provable Generalization of SGD-trained Neural Networks of Any Width in
the Presence of Adversarial Label Noise [85.59576523297568]
We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by gradient descent.
We prove that SGD produces neural networks that have classification accuracy competitive with that of the best halfspace over the distribution.
arXiv Detail & Related papers (2021-01-04T18:32:49Z) - Direct Evolutionary Optimization of Variational Autoencoders With Binary
Latents [0.0]
We show that it is possible to train Variational Autoencoders (VAEs) with discrete latents without sampling-based approximation and re parameterization.
In contrast to large supervised networks, the here investigated VAEs can, e.g., denoise a single image without previous training on clean data and/or training on large image datasets.
arXiv Detail & Related papers (2020-11-27T12:42:12Z) - Training Sparse Neural Networks using Compressed Sensing [13.84396596420605]
We develop and test a novel method based on compressed sensing which combines the pruning and training into a single step.
Specifically, we utilize an adaptively weighted $ell1$ penalty on the weights during training, which we combine with a generalization of the regularized dual averaging (RDA) algorithm in order to train sparse neural networks.
arXiv Detail & Related papers (2020-08-21T19:35:54Z) - Regularizing Deep Networks with Semantic Data Augmentation [44.53483945155832]
We propose a novel semantic data augmentation algorithm to complement traditional approaches.
The proposed method is inspired by the intriguing property that deep networks are effective in learning linearized features.
We show that the proposed implicit semantic data augmentation (ISDA) algorithm amounts to minimizing a novel robust CE loss.
arXiv Detail & Related papers (2020-07-21T00:32:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.