Related papers: On the Importance of Gaussianizing Representations

On the Importance of Gaussianizing Representations

URL: http://arxiv.org/abs/2505.00685v1
Date: Thu, 01 May 2025 17:47:44 GMT
Title: On the Importance of Gaussianizing Representations
Authors: Daniel Eftekhari, Vardan Papyan,
Abstract summary: We present a novel normalization layer which encourages normality in the feature representations of neural networks using the power transform and employs additive Gaussian noise during training.<n>Our experiments demonstrate the effectiveness of normality normalization, in regards to its generalization performance on an array of widely used model and dataset combinations.
Score: 3.6919724596215615
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The normal distribution plays a central role in information theory - it is at the same time the best-case signal and worst-case noise distribution, has the greatest representational capacity of any distribution, and offers an equivalence between uncorrelatedness and independence for joint distributions. Accounting for the mean and variance of activations throughout the layers of deep neural networks has had a significant effect on facilitating their effective training, but seldom has a prescription for precisely what distribution these activations should take, and how this might be achieved, been offered. Motivated by the information-theoretic properties of the normal distribution, we address this question and concurrently present normality normalization: a novel normalization layer which encourages normality in the feature representations of neural networks using the power transform and employs additive Gaussian noise during training. Our experiments comprehensively demonstrate the effectiveness of normality normalization, in regards to its generalization performance on an array of widely used model and dataset combinations, its strong performance across various common factors of variation such as model width, depth, and training minibatch size, its suitability for usage wherever existing normalization layers are conventionally used, and as a means to improving model robustness to random perturbations.

Related papers

On the Sample Complexity of One Hidden Layer Networks with Equivariance, Locality and Weight Sharing [12.845681770287005]
Weight sharing, equivariant, and local filters are believed to contribute to the sample efficiency of neural networks. We show that locality has generalization benefits, however the uncertainty principle implies a trade-off between locality and expressivity.
arXiv Detail & Related papers (2024-11-21T16:36:01Z)
Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional dependencies for general score-mismatched diffusion samplers. We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions. This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z)
Unsupervised Adaptive Normalization [0.07499722271664146]
Unsupervised Adaptive Normalization (UAN) is an innovative algorithm that seamlessly integrates clustering for normalization with deep neural network learning. UAN outperforms the classical methods by adapting to the target task and is effective in classification, and domain adaptation.
arXiv Detail & Related papers (2024-09-07T08:14:11Z)
GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection [60.78684630040313]
Diffusion models tend to reconstruct normal counterparts of test images with certain noises added. From the global perspective, the difficulty of reconstructing images with different anomalies is uneven. We propose a global and local adaptive diffusion model (abbreviated to GLAD) for unsupervised anomaly detection.
arXiv Detail & Related papers (2024-06-11T17:27:23Z)
Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts. We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep. We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z)
Bayesian Renormalization [68.8204255655161]
We present a fully information theoretic approach to renormalization inspired by Bayesian statistical inference. The main insight of Bayesian Renormalization is that the Fisher metric defines a correlation length that plays the role of an emergent RG scale. We provide insight into how the Bayesian Renormalization scheme relates to existing methods for data compression and data generation.
arXiv Detail & Related papers (2023-05-17T18:00:28Z)
Normalizing Flow with Variational Latent Representation [20.038183566389794]
We propose a new framework based on variational latent representation to improve the practical performance of Normalizing Flow (NF) The idea is to replace the standard normal latent variable with a more general latent representation, jointly learned via Variational Bayes. The resulting method is significantly more powerful than the standard normalization flow approach for generating data distributions with multiple modes.
arXiv Detail & Related papers (2022-11-21T16:51:49Z)
Distribution Mismatch Correction for Improved Robustness in Deep Neural Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions. We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer. In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z)
Eccentric Regularization: Minimizing Hyperspherical Energy without explicit projection [0.913755431537592]
We introduce a novel regularizing loss function which simulates a pairwise repulsive force between items. We show that minimizing this loss function in isolation achieves a hyperspherical distribution. We apply this method of Eccentric Regularization to an autoencoder, and demonstrate its effectiveness in image generation, representation learning and downstream classification tasks.
arXiv Detail & Related papers (2021-04-23T13:55:17Z)
Optimization Theory for ReLU Neural Networks Trained with Normalization Layers [82.61117235807606]
The success of deep neural networks in part due to the use of normalization layers. Our analysis shows how the introduction of normalization changes the landscape and can enable faster activation.
arXiv Detail & Related papers (2020-06-11T23:55:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.