Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives
- URL: http://arxiv.org/abs/2506.24124v2
- Date: Tue, 01 Jul 2025 03:40:22 GMT
- Title: Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives
- Authors: Sixun Dong, Wei Fan, Teresa Wu, Yanjie Fu,
- Abstract summary: Time series forecasting traditionally relies on unimodal numerical inputs.<n>We propose a multimodal contrastive learning framework that transforms raw time series into structured visual and textual perspectives.
- Score: 22.10401153489018
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Time series forecasting traditionally relies on unimodal numerical inputs, which often struggle to capture high-level semantic patterns due to their dense and unstructured nature. While recent approaches have explored representing time series as text using large language models (LLMs), these methods remain limited by the discrete nature of token sequences and lack the perceptual intuition humans typically apply, such as interpreting visual patterns. In this paper, we propose a multimodal contrastive learning framework that transforms raw time series into structured visual and textual perspectives. Rather than using natural language or real-world images, we construct both modalities directly from numerical sequences. We then align these views in a shared semantic space via contrastive learning, enabling the model to capture richer and more complementary representations. Furthermore, we introduce a variate selection module that leverages the aligned representations to identify the most informative variables for multivariate forecasting. Extensive experiments on fifteen short-term and six long-term forecasting benchmarks demonstrate that our approach consistently outperforms strong unimodal and cross-modal baselines, highlighting the effectiveness of multimodal alignment in enhancing time series forecasting. Code is available at: https://github.com/Ironieser/TimesCLIP.
Related papers
- DP-GPT4MTS: Dual-Prompt Large Language Model for Textual-Numerical Time Series Forecasting [2.359557447960552]
We introduce DP-GPT4MTS (Dual-Prompt GPT2-base for Multimodal Time Series), a novel dual-prompt large language model framework.<n>It combines two complementary prompts: an explicit prompt for clear task instructions and a textual prompt for context-aware embeddings from time-stamped data.<n>Experiments conducted on diverse textural-numerical time series datasets demonstrate that this approach outperforms state-of-the-art algorithms in time series forecasting.
arXiv Detail & Related papers (2025-08-06T09:25:05Z) - Does Multimodality Lead to Better Time Series Forecasting? [84.74978289870155]
It remains unclear whether and under what conditions such multimodal integration consistently yields gains.<n>We evaluate two popular multimodal forecasting paradigms: aligning-based methods, which align time series and text representations; and prompting-based methods, which directly prompt large language models for forecasting.<n>Our findings highlight that on the modeling side, incorporating text information is most helpful given (1) high-capacity text models, (2) comparatively weaker time series models, and (3) appropriate aligning strategies.
arXiv Detail & Related papers (2025-06-20T23:55:56Z) - Enhancing Time Series Forecasting via Multi-Level Text Alignment with LLMs [6.612196783595362]
We propose a multi-level text alignment framework for time series forecasting using large language models (LLMs)<n>Our method decomposes time series into trend, seasonal, and residual components, which are then reprogrammed into component-specific text representations.<n> Experiments on multiple datasets demonstrate that our method outperforms state-of-the-art models in accuracy while providing good interpretability.
arXiv Detail & Related papers (2025-04-10T01:02:37Z) - TimesBERT: A BERT-Style Foundation Model for Time Series Understanding [72.64824086839631]
GPT-style models have been positioned as foundation models for time series forecasting.<n>BERT-style architecture has not been fully unlocked for time series understanding.<n>We design TimesBERT to learn generic representations of time series.<n>Our model is pre-trained on 260 billion time points across diverse domains.
arXiv Detail & Related papers (2025-02-28T17:14:44Z) - Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative [65.84249211767921]
Texts as Time Series (TaTS) can be plugged into any existing numerical-only time series models.<n>We show that TaTS can enhance predictive performance without modifying model architectures.
arXiv Detail & Related papers (2025-02-13T03:43:27Z) - Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting [26.4608782425897]
Time-VLM is a novel framework that bridges temporal, visual, and textual modalities for enhanced forecasting.<n>Our framework comprises three key components: (1) a Retrieval-Augmented Learner, which extracts enriched temporal features through memory bank interactions; (2) a Vision-Augmented Learner, which encodes time series as informative images; and (3) a Text-Augmented Learner, which generates contextual textual descriptions.
arXiv Detail & Related papers (2025-02-06T05:59:45Z) - Unveiling the Potential of Text in High-Dimensional Time Series Forecasting [12.707274099874384]
We propose a novel framework that integrates time series models with Large Language Models.<n>Inspired by multimodal models, our method combines time series and textual data in the dual-tower structure.<n>Experiments demonstrate that incorporating text enhances high-dimensional time series forecasting performance.
arXiv Detail & Related papers (2025-01-13T04:10:45Z) - VITRO: Vocabulary Inversion for Time-series Representation Optimization [21.338428379212704]
We propose VITRO to bridge the gap between the discrete, semantic nature of natural language and the continuous, numerical nature of time series data.<n>We show that learnable time series-specific pseudo-word embeddings represent time series data better than existing general language model vocabularies.
arXiv Detail & Related papers (2024-12-23T19:24:51Z) - Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization [74.3339999119713]
We develop a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time-localized frequencies.<n>Our method first scales and decomposes the input time series, then thresholds and quantizes the wavelet coefficients, and finally pre-trains an autoregressive model to forecast coefficients for the forecast horizon.
arXiv Detail & Related papers (2024-12-06T18:22:59Z) - TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling [67.02157180089573]
Time series pre-training has recently garnered wide attention for its potential to reduce labeling expenses and benefit various downstream tasks.
This paper proposes TimeSiam as a simple but effective self-supervised pre-training framework for Time series based on Siamese networks.
arXiv Detail & Related papers (2024-02-04T13:10:51Z) - Time-LLM: Time Series Forecasting by Reprogramming Large Language Models [110.20279343734548]
Time series forecasting holds significant importance in many real-world dynamic systems.
We present Time-LLM, a reprogramming framework to repurpose large language models for time series forecasting.
Time-LLM is a powerful time series learner that outperforms state-of-the-art, specialized forecasting models.
arXiv Detail & Related papers (2023-10-03T01:31:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.