Related papers: Language Models Still Struggle to Zero-shot Reason about Time Series

Language Models Still Struggle to Zero-shot Reason about Time Series

URL: http://arxiv.org/abs/2404.11757v1
Date: Wed, 17 Apr 2024 21:27:33 GMT
Title: Language Models Still Struggle to Zero-shot Reason about Time Series
Authors: Mike A. Merrill, Mingtian Tan, Vinayak Gupta, Tom Hartvigsen, Tim Althoff,
Abstract summary: Time series are critical for decision-making in fields like finance and healthcare. It remains unknown whether non-trivial forecasting implies that language models can reason about time series. We generate a first-of-its-kind evaluation framework for time series reasoning.
Score: 11.764833497297493
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Time series are critical for decision-making in fields like finance and healthcare. Their importance has driven a recent influx of works passing time series into language models, leading to non-trivial forecasting on some datasets. But it remains unknown whether non-trivial forecasting implies that language models can reason about time series. To address this gap, we generate a first-of-its-kind evaluation framework for time series reasoning, including formal tasks and a corresponding dataset of multi-scale time series paired with text captions across ten domains. Using these data, we probe whether language models achieve three forms of reasoning: (1) Etiological Reasoning - given an input time series, can the language model identify the scenario that most likely created it? (2) Question Answering - can a language model answer factual questions about time series? (3) Context-Aided Forecasting - does highly relevant textual context improve a language model's time series forecasts? We find that otherwise highly-capable language models demonstrate surprisingly limited time series reasoning: they score marginally above random on etiological and question answering tasks (up to 30 percentage points worse than humans) and show modest success in using context to improve forecasting. These weakness showcase that time series reasoning is an impactful, yet deeply underdeveloped direction for language model research. We also make our datasets and code public at to support further research in this direction at https://github.com/behavioral-data/TSandLanguage

Related papers

Augmenting LLMs for General Time Series Understanding and Prediction [2.426309874608745]
Time series data is fundamental to decision-making in many crucial domains including healthcare, finance, and environmental science.<n>We train this Time Series-augmented LLM (TsLLM) on a large corpus of over 2 million interleaved time series and text examples.<n>This training enables TsLLM to leverage both its language understanding and newly acquired temporal reasoning capabilities.
arXiv Detail & Related papers (2025-10-01T16:54:46Z)
When Does Multimodality Lead to Better Time Series Forecasting? [96.26052272121615]
We investigate whether and under what conditions such multimodal integration consistently yields gains.<n>Our findings reveal that the benefits of multimodality are highly condition-dependent.<n>Our study offers a rigorous, quantitative foundation for understanding when multimodality can be expected to aid forecasting tasks.
arXiv Detail & Related papers (2025-06-20T23:55:56Z)
Inferring Event Descriptions from Time Series with Language Models [13.414101942484582]
Time series data measure how environments change over time and drive decision-making in critical domains like finance and healthcare. We conduct the first study of whether Large Language Models (LLMs) can infer natural language events from time series. We evaluate 16 LLMs and find that they demonstrate promising abilities to infer events from time series data.
arXiv Detail & Related papers (2025-03-18T12:07:33Z)
Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative [65.84249211767921]
Texts as Time Series (TaTS) can be plugged into any existing numerical-only time series models.<n>We show that TaTS can enhance predictive performance without modifying model architectures.
arXiv Detail & Related papers (2025-02-13T03:43:27Z)
ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data [26.300515935897415]
ChatTime is a unified framework for time series and text processing.<n>As an out-of-the-box multimodal time series foundation model, ChatTime provides zero-shot forecasting capability.<n>We design a series of experiments to verify the superior performance of ChatTime across multiple tasks and scenarios.
arXiv Detail & Related papers (2024-12-16T02:04:06Z)
Large language models can be zero-shot anomaly detectors for time series? [9.249657468385779]
sigllm is a framework for time series anomaly detection using large language models. We present a prompt-based detection method that directly asks a language model to indicate which elements of the input are anomalies. We show that the forecasting method significantly outperformed the prompting method in all 11 datasets with respect to the F1 score.
arXiv Detail & Related papers (2024-05-23T16:21:57Z)
A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model [33.17908422599714]
Large language foundation models have unveiled their capabilities for cross-task transferability, zero-shot/few-shot learning, and decision-making explainability. There are two main research lines, namely pre-training foundation models from scratch for time series and adapting large language foundation models for time series. This survey offers a 3E analytical framework for comprehensive examination of related research.
arXiv Detail & Related papers (2024-05-03T03:12:55Z)
Large Language Models Are Zero-Shot Time Series Forecasters [48.73953666153385]
By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text. We find that large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-shot extrapolate time series at a level comparable to or exceeding the performance of purpose-built time series models trained on the downstream tasks.
arXiv Detail & Related papers (2023-10-11T19:01:28Z)
Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain. We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size. Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z)
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models [110.20279343734548]
Time series forecasting holds significant importance in many real-world dynamic systems. We present Time-LLM, a reprogramming framework to repurpose large language models for time series forecasting. Time-LLM is a powerful time series learner that outperforms state-of-the-art, specialized forecasting models.
arXiv Detail & Related papers (2023-10-03T01:31:25Z)
Jamp: Controlled Japanese Temporal Inference Dataset for Evaluating Generalization Capacity of Language Models [18.874880342410876]
We present Jamp, a Japanese benchmark focused on temporal inference. Our dataset includes a range of temporal inference patterns, which enables us to conduct fine-grained analysis. We evaluate the generalization capacities of monolingual/multilingual LMs by splitting our dataset based on tense fragments.
arXiv Detail & Related papers (2023-06-19T07:00:14Z)
Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models [44.670550143705746]
We introduce a comprehensive probing dataset tempreason to evaluate the temporal reasoning capability of large language models. Our dataset includes questions of three temporal reasoning levels. We also propose a novel learning framework to improve the temporal reasoning capability of large language models.
arXiv Detail & Related papers (2023-06-15T08:44:41Z)
A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities. We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention. Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z)
A Dataset for Answering Time-Sensitive Questions [88.95075983560331]
Time is an important dimension in our physical world. Lots of facts can evolve with respect to time. It is important to consider the time dimension and empower the existing QA models to reason over time. The existing QA datasets contain rather few time-sensitive questions, hence not suitable for diagnosing or benchmarking the model's temporal reasoning capability.
arXiv Detail & Related papers (2021-08-13T16:42:25Z)
Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages. We infer this distribution from a sample of typologically diverse training languages. We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z)
TIMEDIAL: Temporal Commonsense Reasoning in Dialog [43.24596551545824]
We present the first study to investigate pre-trained language models for their temporal reasoning capabilities in dialogs. We formulate TIME-DIAL as a multiple-choice cloze task with over 1.1K carefully curated dialogs. Empirical results demonstrate that even the best performing models struggle on this task compared to humans.
arXiv Detail & Related papers (2021-06-08T17:59:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.