NoMorelization: Building Normalizer-Free Models from a Sample's
Perspective
- URL: http://arxiv.org/abs/2210.06932v1
- Date: Thu, 13 Oct 2022 12:04:24 GMT
- Title: NoMorelization: Building Normalizer-Free Models from a Sample's
Perspective
- Authors: Chang Liu, Yuwen Yang, Yue Ding, Hongtao Lu
- Abstract summary: We propose a simple and effective alternative to normalization, which is called "NoMorelization"
NoMorelization is composed of two trainable scalars and a zero-centered noise injector.
Compared with existing mainstream normalizers, NoMorelization shows the best speed-accuracy trade-off.
- Score: 17.027460848621434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The normalizing layer has become one of the basic configurations of deep
learning models, but it still suffers from computational inefficiency,
interpretability difficulties, and low generality. After gaining a deeper
understanding of the recent normalization and normalizer-free research works
from a sample's perspective, we reveal the fact that the problem lies in the
sampling noise and the inappropriate prior assumption. In this paper, we
propose a simple and effective alternative to normalization, which is called
"NoMorelization". NoMorelization is composed of two trainable scalars and a
zero-centered noise injector. Experimental results demonstrate that
NoMorelization is a general component for deep learning and is suitable for
different model paradigms (e.g., convolution-based and attention-based models)
to tackle different tasks (e.g., discriminative and generative tasks). Compared
with existing mainstream normalizers (e.g., BN, LN, and IN) and
state-of-the-art normalizer-free methods, NoMorelization shows the best
speed-accuracy trade-off.
Related papers
- GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection [60.78684630040313]
Diffusion models tend to reconstruct normal counterparts of test images with certain noises added.
From the global perspective, the difficulty of reconstructing images with different anomalies is uneven.
We propose a global and local adaptive diffusion model (abbreviated to GLAD) for unsupervised anomaly detection.
arXiv Detail & Related papers (2024-06-11T17:27:23Z) - Normality Learning-based Graph Anomaly Detection via Multi-Scale
Contrastive Learning [61.57383634677747]
Graph anomaly detection (GAD) has attracted increasing attention in machine learning and data mining.
Here, we propose a normality learning-based GAD framework via multi-scale contrastive learning networks (NLGAD for abbreviation)
Notably, the proposed algorithm improves the detection performance (up to 5.89% AUC gain) compared with the state-of-the-art methods.
arXiv Detail & Related papers (2023-09-12T08:06:04Z) - HyperInvariances: Amortizing Invariance Learning [10.189246340672245]
Invariance learning is expensive and data intensive for popular neural architectures.
We introduce the notion of amortizing invariance learning.
This framework can identify appropriate invariances in different downstream tasks and lead to comparable or better test performance.
arXiv Detail & Related papers (2022-07-17T21:40:37Z) - Explicit Regularization in Overparametrized Models via Noise Injection [14.492434617004932]
We show that small perturbations induce explicit regularization for simple finite-dimensional models.
We empirically show that the small perturbations lead to better generalization performance than vanilla (stochastic) gradient descent training.
arXiv Detail & Related papers (2022-06-09T17:00:23Z) - Information-Theoretic Generalization Bounds for Iterative
Semi-Supervised Learning [81.1071978288003]
In particular, we seek to understand the behaviour of the em generalization error of iterative SSL algorithms using information-theoretic principles.
Our theoretical results suggest that when the class conditional variances are not too large, the upper bound on the generalization error decreases monotonically with the number of iterations, but quickly saturates.
arXiv Detail & Related papers (2021-10-03T05:38:49Z) - Understanding the Generalization of Adam in Learning Neural Networks
with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization.
We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z) - Explainable Deep Few-shot Anomaly Detection with Deviation Networks [123.46611927225963]
We introduce a novel weakly-supervised anomaly detection framework to train detection models.
The proposed approach learns discriminative normality by leveraging the labeled anomalies and a prior probability.
Our model is substantially more sample-efficient and robust, and performs significantly better than state-of-the-art competing methods in both closed-set and open-set settings.
arXiv Detail & Related papers (2021-08-01T14:33:17Z) - Squared $\ell_2$ Norm as Consistency Loss for Leveraging Augmented Data
to Learn Robust and Invariant Representations [76.85274970052762]
Regularizing distance between embeddings/representations of original samples and augmented counterparts is a popular technique for improving robustness of neural networks.
In this paper, we explore these various regularization choices, seeking to provide a general understanding of how we should regularize the embeddings.
We show that the generic approach we identified (squared $ell$ regularized augmentation) outperforms several recent methods, which are each specially designed for one task.
arXiv Detail & Related papers (2020-11-25T22:40:09Z) - Understanding Double Descent Requires a Fine-Grained Bias-Variance
Decomposition [34.235007566913396]
We describe an interpretable, symmetric decomposition of the variance into terms associated with the labels.
We find that the bias decreases monotonically with the network width, but the variance terms exhibit non-monotonic behavior.
We also analyze the strikingly rich phenomenology that arises.
arXiv Detail & Related papers (2020-11-04T21:04:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.