Related papers: DeCoP: Enhancing Self-Supervised Time Series Representation with Dependency Controlled Pre-training

DeCoP: Enhancing Self-Supervised Time Series Representation with Dependency Controlled Pre-training

URL: http://arxiv.org/abs/2509.14642v1
Date: Thu, 18 Sep 2025 05:44:06 GMT
Title: DeCoP: Enhancing Self-Supervised Time Series Representation with Dependency Controlled Pre-training
Authors: Yuemin Wu, Zhongze Wu, Xiu Su, Feng Yang, Hongyan Xu, Xi Lin, Wenti Huang, Shan You, Chang Xu,
Abstract summary: We propose a Dependency Controlled Pre-training framework that explicitly models dynamic, multi-scale dependencies by simulating evolving inter-patch dependencies.<n>DeCoP achieves state-of-the-art results on ten datasets with lower computing resources, improving MSE by 3% on ETTh1 over PatchTST using only 37% of the FLOPs.
Score: 39.30046923897652
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modeling dynamic temporal dependencies is a critical challenge in time series pre-training, which evolve due to distribution shifts and multi-scale patterns. This temporal variability severely impairs the generalization of pre-trained models to downstream tasks. Existing frameworks fail to capture the complex interactions of short- and long-term dependencies, making them susceptible to spurious correlations that degrade generalization. To address these limitations, we propose DeCoP, a Dependency Controlled Pre-training framework that explicitly models dynamic, multi-scale dependencies by simulating evolving inter-patch dependencies. At the input level, DeCoP introduces Instance-wise Patch Normalization (IPN) to mitigate distributional shifts while preserving the unique characteristics of each patch, creating a robust foundation for representation learning. At the latent level, a hierarchical Dependency Controlled Learning (DCL) strategy explicitly models inter-patch dependencies across multiple temporal scales, with an Instance-level Contrastive Module (ICM) enhances global generalization by learning instance-discriminative representations from time-invariant positive pairs. DeCoP achieves state-of-the-art results on ten datasets with lower computing resources, improving MSE by 3% on ETTh1 over PatchTST using only 37% of the FLOPs.

Related papers

A Decomposition-based State Space Model for Multivariate Time-Series Forecasting [0.0]
We propose an end-to-end decomposition framework using three parallel deep state space model branches to capture trend, seasonal, and residual components.<n>Across standard benchmarks, DecompSSM outperformed strong baselines, indicating the effectiveness of combining component-wise deep state space models and global context refinement.
arXiv Detail & Related papers (2026-02-05T07:17:08Z)
SEED: Spectral Entropy-Guided Evaluation of SpatialTemporal Dependencies for Multivariate Time Series Forecasting [8.507253633170947]
We develop a Spectral Entropy-guided Evaluation framework for spatial-temporal Dependency modeling.<n>SEED provides a preliminary evaluation of the spatial and temporal dependencies of each variable, enabling the model to adaptively balance Channel Independence (CI) and Channel Dependence (CD) strategies.<n>SEED achieves state-of-the-art performance, validating its effectiveness and generality.
arXiv Detail & Related papers (2025-12-09T06:18:05Z)
DMSC: Dynamic Multi-Scale Coordination Framework for Time Series Forecasting [14.176801586961286]
Time Series Forecasting (TSF) faces persistent challenges in modeling intricate temporal dependencies across different scales.<n>We propose a novel Dynamic Multi-Scale Coordination Framework (DMSC) with Multi-Scale Patch Decomposition block (EMPD), Triad Interaction Block (TIB) and Adaptive Scale Routing MoE block (ASR-MoE)<n>EMPD is designed as a built-in component to dynamically segment sequences into hierarchical patches with exponentially scaled granularities.<n>TIB then jointly models intra-patch, inter-patch, and cross-variable dependencies within each layer's decomposed representations.
arXiv Detail & Related papers (2025-08-03T13:11:52Z)
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training [64.0932926819307]
We present Warmup-Stable and Merge (WSM), a framework that establishes a formal connection between learning rate decay and model merging.<n>WSM provides a unified theoretical foundation for emulating various decay strategies.<n>Our framework consistently outperforms the widely-adopted Warmup-Stable-Decay (WSD) approach across multiple benchmarks.
arXiv Detail & Related papers (2025-07-23T16:02:06Z)
Learning Time-Aware Causal Representation for Model Generalization in Evolving Domains [50.66049136093248]
We develop a time-aware structural causal model (SCM) that incorporates dynamic causal factors and the causal mechanism drifts.<n>We show that our method can yield the optimal causal predictor for each time domain.<n>Results on both synthetic and real-world datasets exhibit that SYNC can achieve superior temporal generalization performance.
arXiv Detail & Related papers (2025-06-21T14:05:37Z)
Enhancing Channel-Independent Time Series Forecasting via Cross-Variate Patch Embedding [1.1607669836339873]
We propose Cross-Variate Patch Embeddings (CVPE), a lightweight CD module that injects cross-variate context into channel-independent (CI) models.<n>We then integrate CVPE into Time-LLM, a multimodal CI forecasting model, to demonstrate its effectiveness.
arXiv Detail & Related papers (2025-05-19T06:41:14Z)
TiVaT: A Transformer with a Single Unified Mechanism for Capturing Asynchronous Dependencies in Multivariate Time Series Forecasting [4.733959271565453]
TiVaT is a novel architecture incorporating a single unified module, a Joint-Axis (JA) attention module.<n>The JA attention module dynamically selects relevant features to particularly capture asynchronous interactions.<n>Extensive experiments demonstrate TiVaT's overall performance across diverse datasets.
arXiv Detail & Related papers (2024-10-02T13:24:24Z)
Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting [26.141054975797868]
We propose a novel Adaptive Multi-Scale Decomposition (AMD) framework for time series forecasting.<n>Our framework decomposes time series into distinct temporal patterns at multiple scales, leveraging the Multi-Scale Decomposable Mixing (MDM) block.<n>Our approach effectively models both temporal and channel dependencies and utilizes autocorrelation to refine multi-scale data integration.
arXiv Detail & Related papers (2024-06-06T05:27:33Z)
Learning Sequence Representations by Non-local Recurrent Neural Memory [61.65105481899744]
We propose a Non-local Recurrent Neural Memory (NRNM) for supervised sequence representation learning. Our model is able to capture long-range dependencies and latent high-level features can be distilled by our model. Our model compares favorably against other state-of-the-art methods specifically designed for each of these sequence applications.
arXiv Detail & Related papers (2022-07-20T07:26:15Z)
Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables. We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph. Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z)
Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.