TSAQA: Time Series Analysis Question And Answering Benchmark
- URL: http://arxiv.org/abs/2601.23204v1
- Date: Fri, 30 Jan 2026 17:28:56 GMT
- Title: TSAQA: Time Series Analysis Question And Answering Benchmark
- Authors: Baoyu Jing, Sanhorn Chen, Lecheng Zheng, Boyu Liu, Zihao Li, Jiaru Zou, Tianxin Wei, Zhining Liu, Zhichen Zeng, Ruizhong Qiu, Xiao Lin, Yuchen Yan, Dongqi Fu, Jingchao Ni, Jingrui He, Hanghang Tong,
- Abstract summary: Time series data are integral to critical applications across domains such as finance, healthcare, transportation, and environmental science.<n>We introduce TSAQA, a novel unified benchmark designed to broaden task coverage and evaluate diverse temporal analysis capabilities.
- Score: 85.35545785252309
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Time series data are integral to critical applications across domains such as finance, healthcare, transportation, and environmental science. While recent work has begun to explore multi-task time series question answering (QA), current benchmarks remain limited to forecasting and anomaly detection tasks. We introduce TSAQA, a novel unified benchmark designed to broaden task coverage and evaluate diverse temporal analysis capabilities. TSAQA integrates six diverse tasks under a single framework ranging from conventional analysis, including anomaly detection and classification, to advanced analysis, such as characterization, comparison, data transformation, and temporal relationship analysis. Spanning 210k samples across 13 domains, the dataset employs diverse formats, including true-or-false (TF), multiple-choice (MC), and a novel puzzling (PZ), to comprehensively assess time series analysis. Zero-shot evaluation demonstrates that these tasks are challenging for current Large Language Models (LLMs): the best-performing commercial LLM, Gemini-2.5-Flash, achieves an average score of only 65.08. Although instruction tuning boosts open-source performance: the best-performing open-source model, LLaMA-3.1-8B, shows significant room for improvement, highlighting the complexity of temporal analysis for LLMs.
Related papers
- TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models [105.47481207029047]
We introduce Time Series Reasoning Suite (TSR-Suite), which formalizes four atomic tasks that span three fundamental capabilities for reasoning with time series.<n>We also introduce Time Omni-1, the first unified reasoning model designed to address diverse real-world problems demanding time series reasoning.
arXiv Detail & Related papers (2025-09-29T13:54:34Z) - When LLM Meets Time Series: Can LLMs Perform Multi-Step Time Series Reasoning and Inference [12.867006554196358]
We introduce the TSAIA Benchmark, a first attempt to evaluate Large Language Models as time-series AI assistants.<n>The benchmark encompasses a broad spectrum of challenges, ranging from constraint-aware forecasting to anomaly detection with threshold calibration.<n>We apply this benchmark to assess eight state-of-the-art LLMs under a unified evaluation protocol.
arXiv Detail & Related papers (2025-09-01T22:58:57Z) - Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback [55.284574165467525]
Time-series Reasoning for Anomaly (Time-RA) transforms classical time series anomaly detection into a generative, reasoning-intensive task.<n>Also, we introduce the first real-world multimodal benchmark dataset, RATs40K, explicitly annotated for anomaly reasoning.
arXiv Detail & Related papers (2025-07-20T18:02:50Z) - Towards Cross-Modality Modeling for Time Series Analytics: A Survey in the LLM Era [24.980206999214552]
Large Language Models (LLMs) have emerged as a new paradigm for time series analytics.<n>LLMs are pre-trained on textual corpora and are not inherently optimized for time series.<n>This survey is designed for a range of professionals, researchers, and practitioners interested in LLM-based time series modeling.
arXiv Detail & Related papers (2025-05-05T11:35:33Z) - Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models [104.17057231661371]
Time series analysis is crucial for understanding dynamics of complex systems.<n>Recent advances in foundation models have led to task-agnostic Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs)<n>Their success depends on large, diverse, and high-quality datasets, which are challenging to build due to regulatory, diversity, quality, and quantity constraints.<n>This survey provides a comprehensive review of synthetic data for TSFMs and TSLLMs, analyzing data generation strategies, their role in model pretraining, fine-tuning, and evaluation, and identifying future research directions.
arXiv Detail & Related papers (2025-03-14T13:53:46Z) - Time-MQA: Time Series Multi-Task Question Answering with Context Enhancement [55.2439260314328]
Time Series Multi-Task Question Answering (Time-MQA) is a unified framework that enables natural language queries across multiple time series tasks.<n>Central to Time-MQA is the TSQA dataset, a large-scale dataset containing $sim $200k question-answer pairs.
arXiv Detail & Related papers (2025-02-26T13:47:13Z) - Are Large Language Models Useful for Time Series Data Analysis? [3.44393516559102]
Time series data plays a critical role across diverse domains such as healthcare, energy, and finance.<n>This study investigates whether large language models (LLMs) are effective for time series data analysis.
arXiv Detail & Related papers (2024-12-16T02:47:44Z) - Foundation Models for Time Series Analysis: A Tutorial and Survey [70.43311272903334]
Foundation Models (FMs) have fundamentally reshaped the paradigm of model design for time series analysis.
This survey aims to furnish a comprehensive and up-to-date overview of FMs for time series analysis.
arXiv Detail & Related papers (2024-03-21T10:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.