Related papers: From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization

From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization

URL: http://arxiv.org/abs/2508.09191v1
Date: Fri, 08 Aug 2025 03:51:08 GMT
Title: From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization
Authors: Xiaoyu Tao, Shilong Zhang, Mingyue Cheng, Daoyu Wang, Tingyue Pan, Bokai Pan, Changqing Zhang, Shijin Wang,
Abstract summary: Time series forecasting plays a vital role in supporting decision-making across a wide range of critical applications, including energy, healthcare, and finance.<n>We propose TokenCast, an LLM-driven framework that leverages language-based symbolic representations as a unified intermediary for context-aware time series forecasting.<n>Specifically, TokenCast employs a discrete tokenizer to transform continuous numerical sequences into temporal tokens, enabling structural alignment with language-based inputs.
Score: 21.8427780153806
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Time series forecasting plays a vital role in supporting decision-making across a wide range of critical applications, including energy, healthcare, and finance. Despite recent advances, forecasting accuracy remains limited due to the challenge of integrating historical numerical sequences with contextual features, which often comprise unstructured textual data. To address this challenge, we propose TokenCast, an LLM-driven framework that leverages language-based symbolic representations as a unified intermediary for context-aware time series forecasting. Specifically, TokenCast employs a discrete tokenizer to transform continuous numerical sequences into temporal tokens, enabling structural alignment with language-based inputs. To bridge the semantic gap between modalities, both temporal and contextual tokens are embedded into a shared representation space via a pre-trained large language model (LLM), further optimized with autoregressive generative objectives. Building upon this unified semantic space, the aligned LLM is subsequently fine-tuned in a supervised manner to predict future temporal tokens, which are then decoded back into the original numerical space. Extensive experiments on diverse real-world datasets enriched with contextual features demonstrate the effectiveness and generalizability of TokenCast.

Related papers

InstructTime++: Time Series Classification with Multimodal Language Modeling via Implicit Feature Enhancement [52.17579028504616]
InstructTime is a novel framework that reformulates time series classification as a multimodal generative task.<n>InstructTime++ extends InstructTime by incorporating implicit feature modeling to compensate for the limited inductive bias of language models.
arXiv Detail & Related papers (2026-01-21T13:12:23Z)
Augmenting LLMs for General Time Series Understanding and Prediction [2.426309874608745]
Time series data is fundamental to decision-making in many crucial domains including healthcare, finance, and environmental science.<n>We train this Time Series-augmented LLM (TsLLM) on a large corpus of over 2 million interleaved time series and text examples.<n>This training enables TsLLM to leverage both its language understanding and newly acquired temporal reasoning capabilities.
arXiv Detail & Related papers (2025-10-01T16:54:46Z)
Time-Prompt: Integrated Heterogeneous Prompts for Unlocking LLMs in Time Series Forecasting [4.881217428928315]
Time series forecasting aims to model temporal dependencies among variables for future state inference.<n>Recent research demonstrates that large language models (LLMs) achieve promising performance in time series forecasting.<n>We propose LLM-Prompt, an LLM-based time series forecasting framework integrating multi-prompt information and cross-modal semantic alignment.
arXiv Detail & Related papers (2025-06-21T08:22:25Z)
Enhancing Time Series Forecasting via Multi-Level Text Alignment with LLMs [6.612196783595362]
We propose a multi-level text alignment framework for time series forecasting using large language models (LLMs)<n>Our method decomposes time series into trend, seasonal, and residual components, which are then reprogrammed into component-specific text representations.<n> Experiments on multiple datasets demonstrate that our method outperforms state-of-the-art models in accuracy while providing good interpretability.
arXiv Detail & Related papers (2025-04-10T01:02:37Z)
LLM-PS: Empowering Large Language Models for Time Series Forecasting with Temporal Patterns and Semantics [56.99021951927683]
Time Series Forecasting (TSF) is critical in many real-world domains like financial planning and health monitoring.<n>Existing Large Language Models (LLMs) usually perform suboptimally because they neglect the inherent characteristics of time series data.<n>We propose LLM-PS to empower the LLM for TSF by learning the fundamental textitPatterns and meaningful textitSemantics from time series data.
arXiv Detail & Related papers (2025-03-12T11:45:11Z)
TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents [52.13094810313054]
TimeCAP is a time-series processing framework that creatively employs Large Language Models (LLMs) as contextualizers of time series data.<n>TimeCAP incorporates two independent LLM agents: one generates a textual summary capturing the context of the time series, while the other uses this enriched summary to make more informed predictions.<n> Experimental results on real-world datasets demonstrate that TimeCAP outperforms state-of-the-art methods for time series event prediction.
arXiv Detail & Related papers (2025-02-17T04:17:27Z)
TempoGPT: Enhancing Time Series Reasoning via Quantizing Embedding [13.996105878417204]
We propose a multi-modal time series data construction approach and a multi-modal time series language model (TLM), TempoGPT.<n>We construct multi-modal data for complex reasoning tasks by analyzing the variable-system relationships within a white-box system.<n>Extensive experiments demonstrate that TempoGPT accurately perceives temporal information, logically infers conclusions, and achieves state-of-the-art in the constructed complex time series reasoning tasks.
arXiv Detail & Related papers (2025-01-13T13:47:05Z)
Rethinking Time Series Forecasting with LLMs via Nearest Neighbor Contrastive Learning [1.7892194562398749]
We propose NNCL-TLLM: Nearest Neighbor Contrastive Learning for Time series forecasting via Large Language Models.<n>First, we generate time series compatible text prototypes such that each text prototype represents both word token embeddings in its neighborhood and time series characteristics.<n>We then fine-tune the layer normalization and positional embeddings of the LLM, keeping the other layers intact, reducing the trainable parameters and decreasing the computational cost.
arXiv Detail & Related papers (2024-12-06T06:32:47Z)
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens [49.569695524535454]
We propose to represent speech with supervised semantic tokens, which are derived from a multilingual speech recognition model by inserting vector quantization into the encoder. Based on the tokens, we further propose a scalable zero-shot TTS synthesizer, CosyVoice, which consists of an LLM for text-to-token generation and a conditional flow matching model for token-to-speech synthesis.
arXiv Detail & Related papers (2024-07-07T15:16:19Z)
$\ extbf{S}^2$IP-LLM: Semantic Space Informed Prompt Learning with LLM for Time Series Forecasting [21.921303835714628]
We propose Semantic Space Informed Prompt learning with LLM ($S2$IP-LLM) to align the pre-trained semantic space with time series embeddings space. We show that the proposed $S2$IP-LLM can achieve superior forecasting performance over state-of-the-art baselines.
arXiv Detail & Related papers (2024-03-09T05:20:48Z)
AutoTimes: Autoregressive Time Series Forecasters via Large Language Models [67.83502953961505]
AutoTimes projects time series into the embedding space of language tokens and autoregressively generates future predictions with arbitrary lengths. We formulate time series as prompts, extending the context for prediction beyond the lookback window. AutoTimes achieves state-of-the-art with 0.1% trainable parameters and over $5times$ training/inference speedup.
arXiv Detail & Related papers (2024-02-04T06:59:21Z)
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models [110.20279343734548]
Time series forecasting holds significant importance in many real-world dynamic systems. We present Time-LLM, a reprogramming framework to repurpose large language models for time series forecasting. Time-LLM is a powerful time series learner that outperforms state-of-the-art, specialized forecasting models.
arXiv Detail & Related papers (2023-10-03T01:31:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.