Related papers: Mlinear: Rethink the Linear Model for Time-series Forecasting

Mlinear: Rethink the Linear Model for Time-series Forecasting

URL: http://arxiv.org/abs/2305.04800v2
Date: Thu, 3 Aug 2023 16:11:29 GMT
Title: Mlinear: Rethink the Linear Model for Time-series Forecasting
Authors: Wei Li, Xiangxu Meng, Chuhao Chen and Jianing Chen
Abstract summary: Mlinear is a simple yet effective method based mainly on linear layers. We introduce a new loss function that significantly outperforms the widely used mean squared error (MSE) on multiple datasets. Our method significantly outperforms PatchTST with a ratio of 21:3 at 336 sequence length input and 29:10 at 512 sequence length input.
Score: 9.841293660201261
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, significant advancements have been made in time-series forecasting research, with an increasing focus on analyzing the nature of time-series data, e.g, channel-independence (CI) and channel-dependence (CD), rather than solely focusing on designing sophisticated forecasting models. However, current research has primarily focused on either CI or CD in isolation, and the challenge of effectively combining these two opposing properties to achieve a synergistic effect remains an unresolved issue. In this paper, we carefully examine the opposing properties of CI and CD, and raise a practical question that has not been effectively answered, e.g.,"How to effectively mix the CI and CD properties of time series to achieve better predictive performance?" To answer this question, we propose Mlinear (MIX-Linear), a simple yet effective method based mainly on linear layers. The design philosophy of Mlinear mainly includes two aspects:(1) dynamically tuning the CI and CD properties based on the time semantics of different input time series, and (2) providing deep supervision to adjust the individual performance of the "CI predictor" and "CD predictor". In addition, empirically, we introduce a new loss function that significantly outperforms the widely used mean squared error (MSE) on multiple datasets. Experiments on time-series datasets covering multiple fields and widely used have demonstrated the superiority of our method over PatchTST which is the lateset Transformer-based method in terms of the MSE and MAE metrics on 7 datasets with identical sequence inputs (336 or 512). Specifically, our method significantly outperforms PatchTST with a ratio of 21:3 at 336 sequence length input and 29:10 at 512 sequence length input. Additionally, our approach has a 10 $\times$ efficiency advantage at the unit level, taking into account both training and inference times.

Related papers

Output Scaling: YingLong-Delayed Chain of Thought in a Large Pretrained Time Series Forecasting Model [55.25659103706409]
This framework achieves state-of-the-art performance for our designed foundation model, YingLong.<n>YingLong is a non-causal, bidirectional attention encoder-only transformer trained through masked token recovery.<n>We release four foundation models ranging from 6M to 300M parameters, demonstrating superior results in zero-shot tasks.
arXiv Detail & Related papers (2025-05-20T14:31:06Z)
TSPulse: Dual Space Tiny Pre-Trained Models for Rapid Time-Series Analysis [12.034816114258803]
TSPulse is an ultra-compact time-series pre-trained model with only 1M parameters.<n>It performs strongly across classification, anomaly detection, imputation, and retrieval tasks.<n>Results are achieved with just 1M parameters (10-100X smaller than existing SOTA models)
arXiv Detail & Related papers (2025-05-19T12:18:53Z)
TRACE: Time SeRies PArameter EffiCient FinE-tuning [4.398852373014641]
We propose an efficient fine-tuning method for time series foundation models, termed TRACE: Time Series Efficient Fine-tuning. For long-term forecasting tasks, tailored fine-tuning can significantly enhance performance.
arXiv Detail & Related papers (2025-03-21T09:55:43Z)
Diffeomorphic Temporal Alignment Nets for Time-series Joint Alignment and Averaging [8.14908648005543]
In time-series analysis, nonlinear temporal misalignment remains a pivotal challenge that forestalls even simple averaging. DTAN predicts and applies diffeomorphic transformations in an input-dependent manner, thus facilitating the joint alignment (JA) and averaging of time-series ensembles. We extend our framework to incorporate multi-task learning (MT-DTAN), enabling simultaneous timeseries alignment and classification.
arXiv Detail & Related papers (2025-02-10T15:55:08Z)
MATEY: multiscale adaptive foundation models for spatiotemporal physical systems [2.7767126393602726]
We propose two adaptive tokenization schemes that dynamically adjust patch sizes based on local features. We evaluate the performance of a proposed multiscale adaptive model, MATEY, in a sequence of experiments. We also demonstrate fine-tuning tasks featuring different physics that models pretrained on PDE data.
arXiv Detail & Related papers (2024-12-29T22:13:16Z)
Long-Sequence Recommendation Models Need Decoupled Embeddings [49.410906935283585]
We identify and characterize a neglected deficiency in existing long-sequence recommendation models. A single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes. We propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are learned separately to fully decouple attention and representation.
arXiv Detail & Related papers (2024-10-03T15:45:15Z)
TS-TCD: Triplet-Level Cross-Modal Distillation for Time-Series Forecasting Using Large Language Models [15.266543423942617]
We present a novel framework, TS-TCD, which introduces a comprehensive three-tiered cross-modal knowledge distillation mechanism. Unlike prior work that focuses on isolated alignment techniques, our framework systematically integrates. Experiments on benchmark time-series demonstrate that TS-TCD achieves state-of-the-art results, outperforming traditional methods in both accuracy and robustness.
arXiv Detail & Related papers (2024-09-23T12:57:24Z)
Test Time Learning for Time Series Forecasting [1.4605709124065924]
Test-Time Training (TTT) modules consistently outperform state-of-the-art models, including the Mamba-based TimeMachine. Our results show significant improvements in Mean Squared Error (MSE) and Mean Absolute Error (MAE) This work sets a new benchmark for time-series forecasting and lays the groundwork for future research in scalable, high-performance forecasting models.
arXiv Detail & Related papers (2024-09-21T04:40:08Z)
MGCP: A Multi-Grained Correlation based Prediction Network for Multivariate Time Series [54.91026286579748]
We propose a Multi-Grained Correlations-based Prediction Network. It simultaneously considers correlations at three levels to enhance prediction performance. It employs adversarial training with an attention mechanism-based predictor and conditional discriminator to optimize prediction results at coarse-grained level.
arXiv Detail & Related papers (2024-05-30T03:32:44Z)
TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling [67.02157180089573]
Time series pre-training has recently garnered wide attention for its potential to reduce labeling expenses and benefit various downstream tasks. This paper proposes TimeSiam as a simple but effective self-supervised pre-training framework for Time series based on Siamese networks.
arXiv Detail & Related papers (2024-02-04T13:10:51Z)
MPR-Net:Multi-Scale Pattern Reproduction Guided Universality Time Series Interpretable Forecasting [13.790498420659636]
Time series forecasting has received wide interest from existing research due to its broad applications inherent challenging. This paper proposes a forecasting model, MPR-Net. It first adaptively decomposes multi-scale historical series patterns using convolution operation, then constructs a pattern extension forecasting method based on the prior knowledge of pattern reproduction, and finally reconstructs future patterns into future series using deconvolution operation. By leveraging the temporal dependencies present in the time series, MPR-Net not only achieves linear time complexity, but also makes the forecasting process interpretable.
arXiv Detail & Related papers (2023-07-13T13:16:01Z)
The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting [50.48888534815361]
We show that models trained with the Channel Independent (CI) strategy outperform those trained with the Channel Dependent (CD) strategy. Our results conclude that the CD approach has higher capacity but often lacks robustness to accurately predict distributionally drifted time series. We propose a modified CD method called Predict Residuals with Regularization (PRReg) that can surpass the CI strategy.
arXiv Detail & Related papers (2023-04-11T13:15:33Z)
FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task. It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z)
Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement [51.55157852647306]
Time series forecasting has been a widely explored task of great importance in many applications. It is common that real-world time series data are recorded in a short time period, which results in a big gap between the deep model and the limited and noisy time series. We propose to address the time series forecasting problem with generative modeling and propose a bidirectional variational auto-encoder equipped with diffusion, denoise, and disentanglement.
arXiv Detail & Related papers (2023-01-08T12:20:46Z)
Enhancing Transformer Efficiency for Multivariate Time Series Classification [12.128991867050487]
We propose a methodology to investigate the relationship between model efficiency and accuracy, as well as its complexity. Comprehensive experiments on benchmark MTS datasets illustrate the effectiveness of our method.
arXiv Detail & Related papers (2022-03-28T03:25:19Z)
TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance. We propose a versatile method that estimates joint distributions using an attention-based decoder. We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z)
Time Series Forecasting with Ensembled Stochastic Differential Equations Driven by L\'evy Noise [2.3076895420652965]
We use a collection of SDEs equipped with neural networks to predict long-term trend of noisy time series. Our contributions are, first, we use the phase space reconstruction method to extract intrinsic dimension of the time series data. Second, we explore SDEs driven by $alpha$-stable L'evy motion to model the time series data and solve the problem through neural network approximation.
arXiv Detail & Related papers (2021-11-25T16:49:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.