Empowering Time Series Analysis with Large-Scale Multimodal Pretraining
- URL: http://arxiv.org/abs/2602.05646v1
- Date: Thu, 05 Feb 2026 13:26:35 GMT
- Title: Empowering Time Series Analysis with Large-Scale Multimodal Pretraining
- Authors: Peng Chen, Siyuan Wang, Shiyan Hu, Xingjian Wu, Yang Shu, Zhongwen Rao, Meng Wang, Yijie Li, Bin Yang, Chenjuan Guo,
- Abstract summary: Building multimodal foundation models is a natural next step, but it faces key challenges.<n>Lack of a unified multimodal pretraining paradigm and large-scale multimodal corpora for time series analysis.<n>How to effectively integrate heterogeneous modalities and enhance model generalization.
- Score: 34.22079919081765
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While existing time series foundation models primarily rely on large-scale unimodal pretraining, they lack complementary modalities to enhance time series understanding. Building multimodal foundation models is a natural next step, but it faces key challenges: 1) lack of a unified multimodal pretraining paradigm and large-scale multimodal corpora for time series analysis; 2) how to effectively integrate heterogeneous modalities and enhance model generalization. To address these challenges, we take an early step toward multimodal foundation models for time series analysis. We first propose a multimodal pretraining paradigm that leverages time series with endogenous modalities (derived images and text) and exogenous knowledge (real-world news), providing a comprehensive multi-view perspective for time series analysis. To support this, we develop an automated data construction pipeline to curate MM-TS, the first large-scale multimodal time series dataset spanning six domains, with up to one billion points. Then we propose HORAI, a frequency-enhanced multimodal foundation model. It integrates two core components: the Frequency-enhanced Cross-Modality Encoder and the Time-Frequency Decoder, designed to effectively fuse multimodal features and enhance model generalization across modalities and domains. After pretraining on MM-TS, HORAI achieves state-of-the-art zero-shot performance on time series forecasting and anomaly detection tasks, demonstrating strong generalization.
Related papers
- TimeOmni-VL: Unified Models for Time Series Understanding and Generation [66.55423802406078]
Time Omni-VL is a vision-centric framework that unifies time series understanding and generation.<n>Time Omni-VL is the first to leverage time series understanding as an explicit control signal for high-fidelity generation.<n> Experiments confirm that this unified approach significantly improves both semantic understanding and numerical precision.
arXiv Detail & Related papers (2026-02-19T07:50:11Z) - FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis [92.23551599659186]
Time series analysis plays a vital role in fields such as finance, healthcare, industry, and meteorology.<n>FusAD is a unified analysis framework designed for diverse time series tasks.
arXiv Detail & Related papers (2025-12-16T04:34:27Z) - UniCast: A Unified Multimodal Prompting Framework for Time Series Forecasting [9.836278124939453]
Time series forecasting is a foundational task across domains, such as finance, healthcare, and environmental monitoring.<n>Existing models operate predominantly in a unimodal setting, ignoring the rich multimodal context, such as visual and textual signals, that often accompanies time series data in real-world scenarios.<n>This paper introduces a novel parameter-efficient multimodal framework, UniCast, that extends TSFMs to jointly leverage time series, vision, text modalities for enhanced forecasting performance.
arXiv Detail & Related papers (2025-08-16T07:33:27Z) - Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines [5.543238821368548]
Time series often exhibit significant diversity in their temporal patterns across different time spans and domains.<n>Time Tracker achieves state-of-the-art performance in predicting accuracy, model generalization and adaptability.
arXiv Detail & Related papers (2025-05-21T06:18:41Z) - Multi-modal Time Series Analysis: A Tutorial and Survey [36.93906365779472]
Multi-modal time series analysis has emerged as a prominent research area in data mining.<n>However, effective analysis of multi-modal time series is hindered by data heterogeneity, modality gap, misalignment, and inherent noise.<n>Recent advancements in multi-modal time series methods have exploited the multi-modal context via cross-modal interactions.
arXiv Detail & Related papers (2025-03-17T20:30:02Z) - General Time-series Model for Universal Knowledge Representation of Multivariate Time-Series data [61.163542597764796]
We show that time series with different time granularities (or corresponding frequency resolutions) exhibit distinct joint distributions in the frequency domain.<n>A novel Fourier knowledge attention mechanism is proposed to enable learning time-aware representations from both the temporal and frequency domains.<n>An autoregressive blank infilling pre-training framework is incorporated to time series analysis for the first time, leading to a generative tasks agnostic pre-training strategy.
arXiv Detail & Related papers (2025-02-05T15:20:04Z) - Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts [103.725112190618]
This paper introduces Moirai-MoE, using a single input/output projection layer while delegating the modeling of diverse time series patterns to the sparse mixture of experts.
Extensive experiments on 39 datasets demonstrate the superiority of Moirai-MoE over existing foundation models in both in-distribution and zero-shot scenarios.
arXiv Detail & Related papers (2024-10-14T13:01:11Z) - Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai)
Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains.
Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z) - Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM)
During pre-training, we curate large-scale datasets with up to 1 billion time points.
To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.