Related papers: UnitNorm: Rethinking Normalization for Transformers in Time Series

UnitNorm: Rethinking Normalization for Transformers in Time Series

URL: http://arxiv.org/abs/2405.15903v1
Date: Fri, 24 May 2024 19:58:25 GMT
Title: UnitNorm: Rethinking Normalization for Transformers in Time Series
Authors: Nan Huang, Christian Kümmerle, Xiang Zhang,
Abstract summary: Normalization techniques are crucial for enhancing Transformer models' performance and stability in time series analysis tasks. We propose UnitNorm, a novel approach that scales input vectors by their norms and modulates attention patterns. UnitNorm's effectiveness is demonstrated across diverse time series analysis tasks, including forecasting, classification, and anomaly detection.
Score: 9.178527914585446
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Normalization techniques are crucial for enhancing Transformer models' performance and stability in time series analysis tasks, yet traditional methods like batch and layer normalization often lead to issues such as token shift, attention shift, and sparse attention. We propose UnitNorm, a novel approach that scales input vectors by their norms and modulates attention patterns, effectively circumventing these challenges. Grounded in existing normalization frameworks, UnitNorm's effectiveness is demonstrated across diverse time series analysis tasks, including forecasting, classification, and anomaly detection, via a rigorous evaluation on 6 state-of-the-art models and 10 datasets. Notably, UnitNorm shows superior performance, especially in scenarios requiring robust attention mechanisms and contextual comprehension, evidenced by significant improvements by up to a 1.46 decrease in MSE for forecasting, and a 4.89% increase in accuracy for classification. This work not only calls for a reevaluation of normalization strategies in time series Transformers but also sets a new direction for enhancing model performance and stability. The source code is available at https://anonymous.4open.science/r/UnitNorm-5B84.

Related papers

TransDF: Time-Series Forecasting Needs Transformed Label Alignment [53.33409515800757]
We propose Transform-enhanced Direct Forecast (TransDF), which transforms the label sequence into decorrelated components with discriminated significance.<n>Models are trained to align the most significant components, thereby effectively mitigating label autocorrelation and reducing task amount.
arXiv Detail & Related papers (2025-05-23T13:00:35Z)
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free [81.65559031466452]
We conduct experiments to investigate gating-augmented softmax attention variants.<n>We find that a simple modification-applying a head-specific sigmoid gate after the Scaled Dot-Product Attention (SDPA)-consistently improves performance.
arXiv Detail & Related papers (2025-05-10T17:15:49Z)
Bridging Distribution Gaps in Time Series Foundation Model Pretraining with Prototype-Guided Normalization [29.082583523943157]
We propose a domain-aware adaptive normalization strategy within the Transformer architecture. We replace the traditional LayerNorm with a prototype-guided dynamic normalization mechanism (ProtoNorm) Our method significantly outperforms conventional pretraining techniques across both classification and forecasting tasks.
arXiv Detail & Related papers (2025-04-15T06:23:00Z)
MAAT: Mamba Adaptive Anomaly Transformer with association discrepancy for time series [5.924110046959179]
Anomaly detection in time series is essential for industrial monitoring and environmental sensing. Existing methods face limitations such as sensitivity to short-term contexts and inefficiency in noisy, non-stationary environments. We introduce MAAT, an improved architecture that enhances association discrepancy modeling and reconstruction quality.
arXiv Detail & Related papers (2025-02-11T16:22:06Z)
Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting [50.298817606660826]
We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay. Our empirical results demonstrate that Powerformer achieves state-of-the-art accuracy on public time-series benchmarks. Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention.
arXiv Detail & Related papers (2025-02-10T04:42:11Z)
Autoregressive Moving-average Attention Mechanism for Time Series Forecasting [9.114664059026767]
We propose an Autoregressive (AR) Moving-average (MA) attention structure that can adapt to various linear attention mechanisms. In this paper, we first demonstrate that, for the time series forecasting (TSF) task, the previously overlooked decoder-only autoregressive Transformer model can achieve results comparable to the best baselines.
arXiv Detail & Related papers (2024-10-04T05:45:50Z)
Frequency Adaptive Normalization For Non-stationary Time Series Forecasting [7.881136718623066]
Time series forecasting needs to address non-stationary data with evolving trend and seasonal patterns. To address the non-stationarity, instance normalization has been recently proposed to alleviate impacts from the trend with certain statistical measures. This paper proposes a new instance normalization solution, called frequency adaptive normalization (FAN), which extends instance normalization in handling both dynamic trend and seasonal patterns.
arXiv Detail & Related papers (2024-09-30T15:07:16Z)
Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture. We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z)
Attention as an RNN [66.5420926480473]
We show that attention can be viewed as a special Recurrent Neural Network (RNN) with the ability to compute its textitmany-to-one RNN output efficiently. We introduce a new efficient method of computing attention's textitmany-to-many RNN output based on the parallel prefix scan algorithm. We show Aarens achieve comparable performance to Transformers on $38$ datasets spread across four popular sequential problem settings.
arXiv Detail & Related papers (2024-05-22T19:45:01Z)
Boosting X-formers with Structured Matrix for Long Sequence Time Series Forecasting [7.3758245014991255]
We propose a novel architectural framework that enhances Transformer models through the integration of Surrogate Attention Blocks (SAB) and Surrogate Feed-Forward Neural Network Blocks (SFB) Experiments on nine Transformer variants across five distinct time series tasks demonstrate an average performance improvement of 9.45%, alongside a 46% reduction in model size.
arXiv Detail & Related papers (2024-05-21T02:37:47Z)
Attention as Robust Representation for Time Series Forecasting [23.292260325891032]
Time series forecasting is essential for many practical applications. Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role. Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy.
arXiv Detail & Related papers (2024-02-08T03:00:50Z)
Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer. The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z)
Persistence Initialization: A novel adaptation of the Transformer architecture for Time Series Forecasting [0.7734726150561088]
Time series forecasting is an important problem, with many real world applications. We propose a novel adaptation of the original Transformer architecture focusing on the task of time series forecasting. We use a decoder Transformer with ReZero normalization and Rotary positional encodings, but the adaptation is applicable to any auto-regressive neural network model.
arXiv Detail & Related papers (2022-08-30T13:04:48Z)
Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting [86.33543833145457]
We propose Non-stationary Transformers as a generic framework with two interdependent modules: Series Stationarization and De-stationary Attention. Our framework consistently boosts mainstream Transformers by a large margin, which reduces MSE by 49.43% on Transformer, 47.34% on Informer, and 46.89% on Reformer.
arXiv Detail & Related papers (2022-05-28T12:27:27Z)
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior [54.629850694790036]
spectral-normalized identity priors (SNIP) is a structured pruning approach that penalizes an entire residual module in a Transformer model toward an identity mapping. We conduct experiments with BERT on 5 GLUE benchmark tasks to demonstrate that SNIP achieves effective pruning results while maintaining comparable performance.
arXiv Detail & Related papers (2020-10-05T05:40:56Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.