Regularizing activations in neural networks via distribution matching
with the Wasserstein metric
- URL: http://arxiv.org/abs/2002.05366v2
- Date: Mon, 27 Apr 2020 02:31:16 GMT
- Title: Regularizing activations in neural networks via distribution matching
with the Wasserstein metric
- Authors: Taejong Joo, Donggu Kang, Byunghoon Kim
- Abstract summary: We propose the projected error function regularization loss (PER) that encourages activations to follow the standard normal distribution.
Per randomly projects activations onto one-dimensional space and computes the regularization loss in the projected space.
We evaluate the proposed method on the image classification task and the word-level language modeling task.
- Score: 9.442063850095808
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Regularization and normalization have become indispensable components in
training deep neural networks, resulting in faster training and improved
generalization performance. We propose the projected error function
regularization loss (PER) that encourages activations to follow the standard
normal distribution. PER randomly projects activations onto one-dimensional
space and computes the regularization loss in the projected space. PER is
similar to the Pseudo-Huber loss in the projected space, thus taking advantage
of both $L^1$ and $L^2$ regularization losses. Besides, PER can capture the
interaction between hidden units by projection vector drawn from a unit sphere.
By doing so, PER minimizes the upper bound of the Wasserstein distance of order
one between an empirical distribution of activations and the standard normal
distribution. To the best of the authors' knowledge, this is the first work to
regularize activations via distribution matching in the probability
distribution space. We evaluate the proposed method on the image classification
task and the word-level language modeling task.
Related papers
- Generative Conditional Distributions by Neural (Entropic) Optimal Transport [12.152228552335798]
We introduce a novel neural entropic optimal transport method designed to learn generative models of conditional distributions.
Our method relies on the minimax training of two neural networks.
Our experiments on real-world datasets show the effectiveness of our algorithm compared to state-of-the-art conditional distribution learning techniques.
arXiv Detail & Related papers (2024-06-04T13:45:35Z) - Quantile Activation: Correcting a Failure Mode of ML Models [4.035209200949511]
We propose a simple activation function, quantile activation (QAct) that addresses this problem without significantly increasing computational costs.
The proposed quantile activation (QAct) outputs the relative quantile position of neuron activations within their context distribution.
We find that this approach unexpectedly outperforms DINOv2 (small), despite DINOv2 being trained with a much larger network and dataset.
arXiv Detail & Related papers (2024-05-19T14:42:19Z) - Training normalizing flows with computationally intensive target
probability distributions [0.018416014644193065]
We propose an estimator for normalizing flows based on the REINFORCE algorithm.
It is up to ten times faster in terms of the wall-clock time and requires up to $30%$ less memory.
arXiv Detail & Related papers (2023-08-25T10:40:46Z) - Normalizing flow sampling with Langevin dynamics in the latent space [12.91637880428221]
Normalizing flows (NF) use a continuous generator to map a simple latent (e.g. Gaussian) distribution, towards an empirical target distribution associated with a training data set.
Since standard NF implement differentiable maps, they may suffer from pathological behaviors when targeting complex distributions.
This paper proposes a new Markov chain Monte Carlo algorithm to sample from the target distribution in the latent domain before transporting it back to the target domain.
arXiv Detail & Related papers (2023-05-20T09:31:35Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - Distribution Mismatch Correction for Improved Robustness in Deep Neural
Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions.
We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer.
In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z) - KL Guided Domain Adaptation [88.19298405363452]
Domain adaptation is an important problem and often needed for real-world applications.
A common approach in the domain adaptation literature is to learn a representation of the input that has the same distributions over the source and the target domain.
We show that with a probabilistic representation network, the KL term can be estimated efficiently via minibatch samples.
arXiv Detail & Related papers (2021-06-14T22:24:23Z) - DAAIN: Detection of Anomalous and Adversarial Input using Normalizing
Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA)
Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution.
Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z) - Unifying supervised learning and VAEs -- coverage, systematics and
goodness-of-fit in normalizing-flow based neural network models for
astro-particle reconstructions [0.0]
Statistical uncertainties, coverage, systematic uncertainties or a goodness-of-fit measure are often not calculated.
We show that a KL-divergence objective of the joint distribution of data and labels allows to unify supervised learning and variational autoencoders.
We discuss how to calculate coverage probabilities without numerical integration for specific "base-ordered" contours.
arXiv Detail & Related papers (2020-08-13T11:28:57Z) - Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable
Neural Distribution Alignment [52.02794488304448]
We propose a new distribution alignment method based on a log-likelihood ratio statistic and normalizing flows.
We experimentally verify that minimizing the resulting objective results in domain alignment that preserves the local structure of input domains.
arXiv Detail & Related papers (2020-03-26T22:10:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.