Regularizing Deep Networks with Semantic Data Augmentation
- URL: http://arxiv.org/abs/2007.10538v5
- Date: Fri, 4 Jun 2021 09:52:11 GMT
- Title: Regularizing Deep Networks with Semantic Data Augmentation
- Authors: Yulin Wang, Gao Huang, Shiji Song, Xuran Pan, Yitong Xia, Cheng Wu
- Abstract summary: We propose a novel semantic data augmentation algorithm to complement traditional approaches.
The proposed method is inspired by the intriguing property that deep networks are effective in learning linearized features.
We show that the proposed implicit semantic data augmentation (ISDA) algorithm amounts to minimizing a novel robust CE loss.
- Score: 44.53483945155832
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation is widely known as a simple yet surprisingly effective
technique for regularizing deep networks. Conventional data augmentation
schemes, e.g., flipping, translation or rotation, are low-level,
data-independent and class-agnostic operations, leading to limited diversity
for augmented samples. To this end, we propose a novel semantic data
augmentation algorithm to complement traditional approaches. The proposed
method is inspired by the intriguing property that deep networks are effective
in learning linearized features, i.e., certain directions in the deep feature
space correspond to meaningful semantic transformations, e.g., changing the
background or view angle of an object. Based on this observation, translating
training samples along many such directions in the feature space can
effectively augment the dataset for more diversity. To implement this idea, we
first introduce a sampling based method to obtain semantically meaningful
directions efficiently. Then, an upper bound of the expected cross-entropy (CE)
loss on the augmented training set is derived by assuming the number of
augmented samples goes to infinity, yielding a highly efficient algorithm. In
fact, we show that the proposed implicit semantic data augmentation (ISDA)
algorithm amounts to minimizing a novel robust CE loss, which adds minimal
extra computational cost to a normal training procedure. In addition to
supervised learning, ISDA can be applied to semi-supervised learning tasks
under the consistency regularization framework, where ISDA amounts to
minimizing the upper bound of the expected KL-divergence between the augmented
features and the original features. Although being simple, ISDA consistently
improves the generalization performance of popular deep models (e.g., ResNets
and DenseNets) on a variety of datasets, i.e., CIFAR-10, CIFAR-100, SVHN,
ImageNet, and Cityscapes.
Related papers
- Adaptive Anomaly Detection in Network Flows with Low-Rank Tensor Decompositions and Deep Unrolling [9.20186865054847]
Anomaly detection (AD) is increasingly recognized as a key component for ensuring the resilience of future communication systems.
This work considers AD in network flows using incomplete measurements.
We propose a novel block-successive convex approximation algorithm based on a regularized model-fitting objective.
Inspired by Bayesian approaches, we extend the model architecture to perform online adaptation to per-flow and per-time-step statistics.
arXiv Detail & Related papers (2024-09-17T19:59:57Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Learning k-Level Structured Sparse Neural Networks Using Group Envelope Regularization [4.0554893636822]
We introduce a novel approach to deploy large-scale Deep Neural Networks on constrained resources.
The method speeds up inference time and aims to reduce memory demand and power consumption.
arXiv Detail & Related papers (2022-12-25T15:40:05Z) - Weakly Supervised Change Detection Using Guided Anisotropic Difusion [97.43170678509478]
We propose original ideas that help us to leverage such datasets in the context of change detection.
First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results.
We then show its potential in two weakly-supervised learning strategies tailored for change detection.
arXiv Detail & Related papers (2021-12-31T10:03:47Z) - Exploiting Invariance in Training Deep Neural Networks [4.169130102668252]
Inspired by two basic mechanisms in animal visual systems, we introduce a feature transform technique that imposes invariance properties in the training of deep neural networks.
The resulting algorithm requires less parameter tuning, trains well with an initial learning rate 1.0, and easily generalizes to different tasks.
Tested on ImageNet, MS COCO, and Cityscapes datasets, our proposed technique requires fewer iterations to train, surpasses all baselines by a large margin, seamlessly works on both small and large batch size training, and applies to different computer vision tasks of image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2021-03-30T19:18:31Z) - Context Decoupling Augmentation for Weakly Supervised Semantic
Segmentation [53.49821324597837]
Weakly supervised semantic segmentation is a challenging problem that has been deeply studied in recent years.
We present a Context Decoupling Augmentation ( CDA) method to change the inherent context in which the objects appear.
To validate the effectiveness of the proposed method, extensive experiments on PASCAL VOC 2012 dataset with several alternative network architectures demonstrate that CDA can boost various popular WSSS methods to the new state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-03-02T15:05:09Z) - Solving Sparse Linear Inverse Problems in Communication Systems: A Deep
Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems.
The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.