Mixture of Low Rank Adaptation with Partial Parameter Sharing for Time Series Forecasting
- URL: http://arxiv.org/abs/2505.17872v2
- Date: Tue, 27 May 2025 07:23:28 GMT
- Title: Mixture of Low Rank Adaptation with Partial Parameter Sharing for Time Series Forecasting
- Authors: Licheng Pan, Zhichao Chen, Haoxuan Li, Guangyi Liu, Zhijian Xu, Zhaoran Liu, Hao Wang, Ying Wei,
- Abstract summary: We show that multi-task forecasting suffers from an Expressiveness Bottleneck, where predictions at different time steps share the same representation.<n>We propose a two-stage framework: first, pre-train a foundation model for one-step-ahead prediction; then, adapt it using step-specific LoRA modules.<n>Experiments show that MoLA significantly improves model expressiveness and outperforms state-of-the-art time-series forecasting methods.
- Score: 20.505925622104964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-task forecasting has become the standard approach for time-series forecasting (TSF). However, we show that it suffers from an Expressiveness Bottleneck, where predictions at different time steps share the same representation, leading to unavoidable errors even with optimal representations. To address this issue, we propose a two-stage framework: first, pre-train a foundation model for one-step-ahead prediction; then, adapt it using step-specific LoRA modules.This design enables the foundation model to handle any number of forecast steps while avoiding the expressiveness bottleneck. We further introduce the Mixture-of-LoRA (MoLA) model, which employs adaptively weighted LoRA experts to achieve partial parameter sharing across steps. This approach enhances both efficiency and forecasting performance by exploiting interdependencies between forecast steps. Experiments show that MoLA significantly improves model expressiveness and outperforms state-of-the-art time-series forecasting methods. Code is available at https://anonymous.4open.science/r/MoLA-BC92.
Related papers
- WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training [64.0932926819307]
We present Warmup-Stable and Merge (WSM), a framework that establishes a formal connection between learning rate decay and model merging.<n>WSM provides a unified theoretical foundation for emulating various decay strategies.<n>Our framework consistently outperforms the widely-adopted Warmup-Stable-Decay (WSD) approach across multiple benchmarks.
arXiv Detail & Related papers (2025-07-23T16:02:06Z) - Elucidated Rolling Diffusion Models for Probabilistic Weather Forecasting [52.6508222408558]
We introduce Elucidated Rolling Diffusion Models (ERDM)<n>ERDM is the first framework to unify a rolling forecast structure with the principled, performant design of Elucidated Diffusion Models (EDM)<n>On 2D Navier-Stokes simulations and ERA5 global weather forecasting at 1.5circ resolution, ERDM consistently outperforms key diffusion-based baselines.
arXiv Detail & Related papers (2025-06-24T21:44:31Z) - ADiff4TPP: Asynchronous Diffusion Models for Temporal Point Processes [30.928368603673285]
This work introduces a novel approach to modeling temporal point processes using diffusion models with an asynchronous noise schedule.<n>We derive an objective to effectively train these models for a general family of noise schedules based on conditional flow matching.<n>Our method achieves the joint distribution of the latent representations of events in a sequence and state-of-the-art results in predicting both the next inter-event time and event type on benchmark datasets.
arXiv Detail & Related papers (2025-04-29T04:17:39Z) - Sundial: A Family of Highly Capable Time Series Foundation Models [64.6322079384575]
We introduce Sundial, a family of native, flexible, and scalable time series foundation models.<n>Our model is pre-trained without specifying any prior distribution and can generate multiple probable predictions.<n>By mitigating mode collapse through TimeFlow Loss, we pre-train a family of Sundial models on TimeBench, which exhibit unprecedented model capacity and generalization performance.
arXiv Detail & Related papers (2025-02-02T14:52:50Z) - xLSTM-Mixer: Multivariate Time Series Forecasting by Mixing via Scalar Memories [20.773694998061707]
Time series data is prevalent across numerous fields, necessitating the development of robust and accurate forecasting models.
We introduce xLSTM-Mixer, a model designed to effectively integrate temporal sequences, joint time-variable information, and multiple perspectives for robust forecasting.
Our evaluations demonstrate xLSTM-Mixer's superior long-term forecasting performance compared to recent state-of-the-art methods.
arXiv Detail & Related papers (2024-10-22T11:59:36Z) - Loss Shaping Constraints for Long-Term Time Series Forecasting [79.3533114027664]
We present a Constrained Learning approach for long-term time series forecasting that respects a user-defined upper bound on the loss at each time-step.
We propose a practical Primal-Dual algorithm to tackle it, and aims to demonstrate that it exhibits competitive average performance in time series benchmarks, while shaping the errors across the predicted window.
arXiv Detail & Related papers (2024-02-14T18:20:44Z) - Supervised Contrastive Learning based Dual-Mixer Model for Remaining
Useful Life Prediction [3.081898819471624]
The Remaining Useful Life (RUL) prediction aims at providing an accurate estimate of the remaining time from the current predicting moment to the complete failure of the device.
To overcome the shortcomings of rigid combination for temporal and spatial features in most existing RUL prediction approaches, a spatial-temporal homogeneous feature extractor, named Dual-Mixer model, is proposed.
The effectiveness of the proposed method is validated through comparisons with other latest research works on the C-MAPSS dataset.
arXiv Detail & Related papers (2024-01-29T14:38:44Z) - Interacting Diffusion Processes for Event Sequence Forecasting [20.380620709345898]
We introduce a novel approach that incorporates a diffusion generative model.
The model facilitates sequence-to-sequence prediction, allowing multi-step predictions based on historical event sequences.
We demonstrate that our proposal outperforms state-of-the-art baselines for long-horizon forecasting of TPP.
arXiv Detail & Related papers (2023-10-26T22:17:25Z) - Attention-Based Ensemble Pooling for Time Series Forecasting [55.2480439325792]
We propose a method for pooling that performs a weighted average over candidate model forecasts.
We test this method on two time-series forecasting problems: multi-step forecasting of the dynamics of the non-stationary Lorenz 63 equation, and one-step forecasting of the weekly incident deaths due to COVID-19.
arXiv Detail & Related papers (2023-10-24T22:59:56Z) - Lag-Llama: Towards Foundation Models for Probabilistic Time Series
Forecasting [54.04430089029033]
We present Lag-Llama, a general-purpose foundation model for time series forecasting based on a decoder-only transformer architecture.
Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities.
When fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-12T12:29:32Z) - Generative Time Series Forecasting with Diffusion, Denoise, and
Disentanglement [51.55157852647306]
Time series forecasting has been a widely explored task of great importance in many applications.
It is common that real-world time series data are recorded in a short time period, which results in a big gap between the deep model and the limited and noisy time series.
We propose to address the time series forecasting problem with generative modeling and propose a bidirectional variational auto-encoder equipped with diffusion, denoise, and disentanglement.
arXiv Detail & Related papers (2023-01-08T12:20:46Z) - Joint Forecasting of Panoptic Segmentations with Difference Attention [72.03470153917189]
We study a new panoptic segmentation forecasting model that jointly forecasts all object instances in a scene.
We evaluate the proposed model on the Cityscapes and AIODrive datasets.
arXiv Detail & Related papers (2022-04-14T17:59:32Z) - Meta-Forecasting by combining Global DeepRepresentations with Local
Adaptation [12.747008878068314]
We introduce a novel forecasting method called Meta Global-Local Auto-Regression (Meta-GLAR)
It adapts to each time series by learning in closed-form the mapping from the representations produced by a recurrent neural network (RNN) to one-step-ahead forecasts.
Our method is competitive with the state-of-the-art in out-of-sample forecasting accuracy reported in earlier work.
arXiv Detail & Related papers (2021-11-05T11:45:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.