Related papers: Bridging Distribution Gaps in Time Series Foundation Model Pretraining with Prototype-Guided Normalization

Bridging Distribution Gaps in Time Series Foundation Model Pretraining with Prototype-Guided Normalization

URL: http://arxiv.org/abs/2504.10900v1
Date: Tue, 15 Apr 2025 06:23:00 GMT
Title: Bridging Distribution Gaps in Time Series Foundation Model Pretraining with Prototype-Guided Normalization
Authors: Peiliang Gong, Emadeldeen Eldele, Min Wu, Zhenghua Chen, Xiaoli Li, Daoqiang Zhang,
Abstract summary: We propose a domain-aware adaptive normalization strategy within the Transformer architecture.<n>We replace the traditional LayerNorm with a prototype-guided dynamic normalization mechanism (ProtoNorm)<n>Our method significantly outperforms conventional pretraining techniques across both classification and forecasting tasks.
Score: 29.082583523943157
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Foundation models have achieved remarkable success across diverse machine-learning domains through large-scale pretraining on large, diverse datasets. However, pretraining on such datasets introduces significant challenges due to substantial mismatches in data distributions, a problem particularly pronounced with time series data. In this paper, we tackle this issue by proposing a domain-aware adaptive normalization strategy within the Transformer architecture. Specifically, we replace the traditional LayerNorm with a prototype-guided dynamic normalization mechanism (ProtoNorm), where learned prototypes encapsulate distinct data distributions, and sample-to-prototype affinity determines the appropriate normalization layer. This mechanism effectively captures the heterogeneity of time series characteristics, aligning pretrained representations with downstream tasks. Through comprehensive empirical evaluation, we demonstrate that our method significantly outperforms conventional pretraining techniques across both classification and forecasting tasks, while effectively mitigating the adverse effects of distribution shifts during pretraining. Incorporating ProtoNorm is as simple as replacing a single line of code. Extensive experiments on diverse real-world time series benchmarks validate the robustness and generalizability of our approach, advancing the development of more versatile time series foundation models.

Related papers

A Wireless Foundation Model for Multi-Task Prediction [50.21098141769079]
We propose a unified foundation model for multi-task prediction in wireless networks that supports diverse prediction intervals.<n>After trained on large-scale datasets, the proposed foundation model demonstrates strong generalization to unseen scenarios and zero-shot performance on new tasks.
arXiv Detail & Related papers (2025-07-08T12:37:55Z)
GeneralizeFormer: Layer-Adaptive Model Generation across Test-Time Distribution Shifts [58.95913531746308]
We consider the problem of test-time domain generalization, where a model is trained on several source domains and adjusted on target domains never seen during training.<n>We propose to generate multiple layer parameters on the fly during inference by a lightweight meta-learned transformer, which we call textitGeneralizeFormer.
arXiv Detail & Related papers (2025-02-15T10:10:49Z)
Federated Foundation Models on Heterogeneous Time Series [36.229082478423585]
Efforts are primarily focused on fusing cross-domain time series datasets to extract shared subsequences as tokens for training models on Transformer architecture.<n>This paper proposes a novel federated learning approach to address the heterogeneity in time series foundation models training, namely FFTS.<n>The newly learned time series foundation models achieve superior generalization capabilities on cross-domain time series analysis tasks, including forecasting, imputation, and anomaly detection.
arXiv Detail & Related papers (2024-12-12T03:38:01Z)
UTSD: Unified Time Series Diffusion Model [13.555837288440946]
A Unified Time Series Diffusion model is established for the first time to model the multi-domain probability distribution.<n>We conduct extensive experiments on mainstream benchmarks, and the pre-trained UTSD outperforms existing foundation models on all data domains.
arXiv Detail & Related papers (2024-12-04T06:42:55Z)
MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models. We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z)
Towards Generalisable Time Series Understanding Across Domains [10.350643783811174]
We introduce a novel pre-training paradigm specifically designed to handle time series heterogeneity.<n>We propose a tokeniser with learnable domain signatures, a dual masking strategy, and a normalised cross-correlation loss.<n>Our code and pre-trained weights are available at https://www.oetu.com/oetu/otis.
arXiv Detail & Related papers (2024-10-09T17:09:30Z)
PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a. Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns. We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z)
Deep Multi-Manifold Transformation Based Multivariate Time Series Fault Detection [22.005142941322912]
We propose a new method that combines a neighborhood-driven data augmentation strategy with a multi-manifold representation learning framework.<n>Our method achieves superior performance in terms of both accuracy and robustness, showing strong potential for generalization and real-world deployment.
arXiv Detail & Related papers (2024-05-25T14:48:04Z)
UniCL: A Universal Contrastive Learning Framework for Large Time Series Models [18.005358506435847]
Time-series analysis plays a pivotal role across a range of critical applications, from finance to healthcare. Traditional supervised learning methods first annotate extensive labels for time-series data in each task. This paper introduces UniCL, a universal and scalable contrastive learning framework designed for pretraining time-series foundation models.
arXiv Detail & Related papers (2024-05-17T07:47:11Z)
Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai) Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains. Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z)
Toward a Foundation Model for Time Series Data [34.1973242428317]
A foundation model is a machine learning model trained on a large and diverse set of data. We develop an effective time series foundation model by leveraging unlabeled samples from multiple domains.
arXiv Detail & Related papers (2023-10-05T21:44:50Z)
NormAUG: Normalization-guided Augmentation for Domain Generalization [60.159546669021346]
We propose a simple yet effective method called NormAUG (Normalization-guided Augmentation) for deep learning. Our method introduces diverse information at the feature level and improves the generalization of the main path. In the test stage, we leverage an ensemble strategy to combine the predictions from the auxiliary path of our model, further boosting performance.
arXiv Detail & Related papers (2023-07-25T13:35:45Z)
Semantic Self-adaptation: Enhancing Generalization with a Single Sample [45.111358665370524]
We propose a self-adaptive approach for semantic segmentation. It fine-tunes the parameters of convolutional layers to the input image using consistency regularization. Our empirical study suggests that self-adaptation may complement the established practice of model regularization at training time.
arXiv Detail & Related papers (2022-08-10T12:29:01Z)
TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance. We propose a versatile method that estimates joint distributions using an attention-based decoder. We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.