Breaking Time Invariance: Assorted-Time Normalization for RNNs
- URL: http://arxiv.org/abs/2209.14439v1
- Date: Wed, 28 Sep 2022 21:51:13 GMT
- Title: Breaking Time Invariance: Assorted-Time Normalization for RNNs
- Authors: Cole Pospisil, Vasily Zadorozhnyy, Qiang Ye
- Abstract summary: We propose a normalization method called Assorted-Time Normalization (ATN)
ATN preserves information from multiple consecutive time steps and normalizes using them.
Our experiments applying ATN to LN demonstrate consistent improvement on various tasks.
- Score: 5.229616140749998
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Methods such as Layer Normalization (LN) and Batch Normalization (BN) have
proven to be effective in improving the training of Recurrent Neural Networks
(RNNs). However, existing methods normalize using only the instantaneous
information at one particular time step, and the result of the normalization is
a preactivation state with a time-independent distribution. This implementation
fails to account for certain temporal differences inherent in the inputs and
the architecture of RNNs. Since these networks share weights across time steps,
it may also be desirable to account for the connections between time steps in
the normalization scheme. In this paper, we propose a normalization method
called Assorted-Time Normalization (ATN), which preserves information from
multiple consecutive time steps and normalizes using them. This setup allows us
to introduce longer time dependencies into the traditional normalization
methods without introducing any new trainable parameters. We present
theoretical derivations for the gradient propagation and prove the weight
scaling invariance property. Our experiments applying ATN to LN demonstrate
consistent improvement on various tasks, such as Adding, Copying, and Denoise
Problems and Language Modeling Problems.
Related papers
- Enhancing Neural Network Representations with Prior Knowledge-Based Normalization [0.07499722271664146]
We introduce a new approach to multi-mode normalization that leverages prior knowledge to improve neural network representations.
Our methods demonstrate superior convergence and performance across tasks in image classification, domain adaptation, and image generation.
arXiv Detail & Related papers (2024-03-25T14:17:38Z) - State Derivative Normalization for Continuous-Time Deep Neural Networks [3.19246787432138]
In this contribution, we show that these three normalization tasks are inherently coupled.
We propose a solution to all three normalization challenges by introducing a normalization constant at the state derivative level.
arXiv Detail & Related papers (2024-01-05T17:04:33Z) - Time-Parameterized Convolutional Neural Networks for Irregularly Sampled
Time Series [26.77596449192451]
Irregularly sampled time series are ubiquitous in several application domains, leading to sparse, not fully-observed and non-aligned observations.
Standard sequential neural networks (RNNs) and convolutional neural networks (CNNs) consider regular spacing between observation times, posing significant challenges to irregular time series modeling.
We parameterize convolutional layers by employing time-explicitly irregular kernels.
arXiv Detail & Related papers (2023-08-06T21:10:30Z) - Time Regularization in Optimal Time Variable Learning [0.4490343701046724]
Recently, optimal time variable learning in deep neural networks (DNNs) was introduced in arXiv:2204.08528.
We extend the concept by introducing a regularization term that directly relates to the time horizon in discrete dynamical systems.
We propose an adaptive pruning approach for Residual Neural Networks (ResNets)
Results are illustrated by applying the proposed concepts to classification tasks on the well known MNIST and Fashion MNIST data sets.
arXiv Detail & Related papers (2023-06-28T11:27:48Z) - Context Normalization Layer with Applications [0.1499944454332829]
This study proposes a new normalization technique, called context normalization, for image data.
It adjusts the scaling of features based on the characteristics of each sample, which improves the model's convergence speed and performance.
The effectiveness of context normalization is demonstrated on various datasets, and its performance is compared to other standard normalization techniques.
arXiv Detail & Related papers (2023-03-14T06:38:17Z) - Distribution Mismatch Correction for Improved Robustness in Deep Neural
Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions.
We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer.
In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z) - Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs)
We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics.
Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z) - STEER: Simple Temporal Regularization For Neural ODEs [80.80350769936383]
We propose a new regularization technique: randomly sampling the end time of the ODE during training.
The proposed regularization is simple to implement, has negligible overhead and is effective across a wide variety of tasks.
We show through experiments on normalizing flows, time series models and image recognition that the proposed regularization can significantly decrease training time and even improve performance over baseline models.
arXiv Detail & Related papers (2020-06-18T17:44:50Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z) - PowerNorm: Rethinking Batch Normalization in Transformers [96.14956636022957]
normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN)
LN is preferred due to the empirical observation that a (naive/vanilla) use of BN leads to significant performance degradation for NLP tasks.
We propose Power Normalization (PN), a novel normalization scheme that resolves this issue.
arXiv Detail & Related papers (2020-03-17T17:50:26Z) - Cross-Iteration Batch Normalization [67.83430009388678]
We present Cross-It Batch Normalization (CBN), in which examples from multiple recent iterations are jointly utilized to enhance estimation quality.
CBN is found to outperform the original batch normalization and a direct calculation of statistics over previous iterations without the proposed compensation technique.
arXiv Detail & Related papers (2020-02-13T18:52:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.