Related papers: Conversational Time Series Foundation Models: Towards Explainable and Effective Forecasting

Conversational Time Series Foundation Models: Towards Explainable and Effective Forecasting

URL: http://arxiv.org/abs/2512.16022v1
Date: Wed, 17 Dec 2025 23:14:38 GMT
Title: Conversational Time Series Foundation Models: Towards Explainable and Effective Forecasting
Authors: Defu Cao, Michael Gee, Jinbo Liu, Hengxuan Wang, Wei Yang, Rui Wang, Yan Liu,
Abstract summary: Large Language Models (LLMs) offer powerful reasoning capabilities, but their direct application to time series forecasting has proven ineffective.<n>We introduce an R1-style finetuning process, guided by SHAP-based faithfulness scores, which teaches the model to interpret ensemble weights as meaningful causal statements.<n>Our approach significantly outperforms leading time series foundation models on both CRPS and MASE metrics.
Score: 13.958506262265871
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The proliferation of time series foundation models has created a landscape where no single method achieves consistent superiority, framing the central challenge not as finding the best model, but as orchestrating an optimal ensemble with interpretability. While Large Language Models (LLMs) offer powerful reasoning capabilities, their direct application to time series forecasting has proven ineffective. We address this gap by repositioning the LLM as an intelligent judge that evaluates, explains, and strategically coordinates an ensemble of foundation models. To overcome the LLM's inherent lack of domain-specific knowledge on time series, we introduce an R1-style finetuning process, guided by SHAP-based faithfulness scores, which teaches the model to interpret ensemble weights as meaningful causal statements about temporal dynamics. The trained agent then engages in iterative, multi-turn conversations to perform forward-looking assessments, provide causally-grounded explanations for its weighting decisions, and adaptively refine the optimization strategy. Validated on the GIFT-Eval benchmark on 23 datasets across 97 settings, our approach significantly outperforms leading time series foundation models on both CRPS and MASE metrics, establishing new state-of-the-art results.

Related papers

Characteristic Root Analysis and Regularization for Linear Time Series Forecasting [9.254995889539716]
Time series forecasting remains a critical challenge across numerous domains.<n>Recent studies highlight the surprising competitiveness of simple linear models.<n>This paper focuses on the role of characteristic roots in temporal dynamics.
arXiv Detail & Related papers (2025-09-28T03:06:30Z)
ARIES: Relation Assessment and Model Recommendation for Deep Time Series Forecasting [54.57031153712623]
ARIES is a framework for assessing relation between time series properties and modeling strategies.<n>We propose the first deep forecasting model recommender, capable of providing interpretable suggestions for real-world time series.
arXiv Detail & Related papers (2025-09-07T13:57:14Z)
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training [64.0932926819307]
We present Warmup-Stable and Merge (WSM), a framework that establishes a formal connection between learning rate decay and model merging.<n>WSM provides a unified theoretical foundation for emulating various decay strategies.<n>Our framework consistently outperforms the widely-adopted Warmup-Stable-Decay (WSD) approach across multiple benchmarks.
arXiv Detail & Related papers (2025-07-23T16:02:06Z)
Enhancing LLM Reasoning for Time Series Classification by Tailored Thinking and Fused Decision [8.256998757769322]
ReasonTSC is a framework designed to leverage LLM reasoning for time series classification.<n>It steers the model to think over the essential characteristics of time series data.<n>It integrates predictions and confidence scores from plug-in classifiers, e.g., domain-specific time series models, as in-context examples.
arXiv Detail & Related papers (2025-06-01T03:15:54Z)
Relative Overfitting and Accept-Reject Framework [5.465098504510676]
We propose an ensemble framework that governs how models are segmented to ensure performance improvement.<n>We detail the patterns of this framework within the domain of NLP and briefly describe its to other fields, such as computer vision (CV) and AI for science.
arXiv Detail & Related papers (2025-05-12T17:36:14Z)
Disentangling Length Bias In Preference Learning Via Response-Conditioned Modeling [87.17041933863041]
Reinforcement Learning from Human Feedback (RLHF) has achieved considerable success in aligning large language models (LLMs)<n>We introduce a $textbfR$esponse-$textbfc$onditioned $textbfB$radley-$textbfT$erry (Rc-BT) model that enhances the model's capability in length bias mitigating and length instruction following.<n>We also propose the Rc-RM and Rc-DPO algorithm to leverage the Rc-BT model for reward modeling and direct policy optimization
arXiv Detail & Related papers (2025-02-02T14:50:25Z)
Implicit Reasoning in Deep Time Series Forecasting [16.750280337155647]
This work takes an initial step toward assessing the reasoning abilities of deep time series forecasting models. We find that certain linear, patch-based Transformer models generalize effectively in systematically orchestrated out-of-distribution scenarios.
arXiv Detail & Related papers (2024-09-17T02:11:19Z)
Low-Rank Adaptation of Time Series Foundational Models for Out-of-Domain Modality Forecasting [5.354055742467354]
Low-Rank Adaptation (LoRA) is a technique for fine-tuning large pre-trained or foundational models across different modalities and tasks. This paper examines the impact of LoRA on contemporary time series foundational models: Lag-Llama, MOIRAI, and Chronos.
arXiv Detail & Related papers (2024-05-16T16:05:33Z)
A Large-Scale Evaluation of Speech Foundation Models [110.95827399522204]
We establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the foundation model paradigm for speech. We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads.
arXiv Detail & Related papers (2024-04-15T00:03:16Z)
Embedded feature selection in LSTM networks with multi-objective evolutionary ensemble learning for time series forecasting [49.1574468325115]
We present a novel feature selection method embedded in Long Short-Term Memory networks. Our approach optimize the weights and biases of the LSTM in a partitioned manner. Experimental evaluations on air quality time series data from Italy and southeast Spain demonstrate that our method substantially improves the ability generalization of conventional LSTMs.
arXiv Detail & Related papers (2023-12-29T08:42:10Z)
Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting [54.04430089029033]
We present Lag-Llama, a general-purpose foundation model for time series forecasting based on a decoder-only transformer architecture. Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities. When fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-12T12:29:32Z)
Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals [67.64770842323966]
Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust. Existing methods often fall short of explaining model predictions effectively or efficiently. We propose two approaches for counterfactual (CF) approximation.
arXiv Detail & Related papers (2023-10-01T07:31:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.