STEB: In Search of the Best Evaluation Approach for Synthetic Time Series
- URL: http://arxiv.org/abs/2505.21160v1
- Date: Tue, 27 May 2025 13:15:35 GMT
- Title: STEB: In Search of the Best Evaluation Approach for Synthetic Time Series
- Authors: Michael Stenger, Robert Leppich, André Bauer, Samuel Kounev,
- Abstract summary: We propose the Synthetic Time series Evaluation Benchmark (STEB)<n>STEB computes indicators for measure reliability and score consistency.<n>It tracks running time, test errors, and features sequential and parallel modes of operation.
- Score: 2.5216923314390733
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The growing need for synthetic time series, due to data augmentation or privacy regulations, has led to numerous generative models, frameworks, and evaluation measures alike. Objectively comparing these measures on a large scale remains an open challenge. We propose the Synthetic Time series Evaluation Benchmark (STEB) -- the first benchmark framework that enables comprehensive and interpretable automated comparisons of synthetic time series evaluation measures. Using 10 diverse datasets, randomness injection, and 13 configurable data transformations, STEB computes indicators for measure reliability and score consistency. It tracks running time, test errors, and features sequential and parallel modes of operation. In our experiments, we determine a ranking of 41 measures from literature and confirm that the choice of upstream time series embedding heavily impacts the final score.
Related papers
- Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting [64.45587649141842]
Time-series forecasting plays a critical role in many real-world applications.<n>No single model consistently outperforms others across different test samples, but instead (ii) each model excels in specific cases.<n>We introduce TimeFuse, a framework for collective time-series forecasting with sample-level adaptive fusion of heterogeneous models.
arXiv Detail & Related papers (2025-05-24T00:45:07Z) - Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models [104.17057231661371]
Time series analysis is crucial for understanding dynamics of complex systems.<n>Recent advances in foundation models have led to task-agnostic Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs)<n>Their success depends on large, diverse, and high-quality datasets, which are challenging to build due to regulatory, diversity, quality, and quantity constraints.<n>This survey provides a comprehensive review of synthetic data for TSFMs and TSLLMs, analyzing data generation strategies, their role in model pretraining, fine-tuning, and evaluation, and identifying future research directions.
arXiv Detail & Related papers (2025-03-14T13:53:46Z) - Evaluating Time Series Foundation Models on Noisy Periodic Time Series [0.0]
This paper presents an empirical study evaluating the performance of time series foundation models (TSFMs) over two datasets constituting noisy periodic time series.<n>Our findings demonstrate that while for time series with bounded periods, TSFMs can match or outperform the statistical approaches, their forecasting abilities deteriorate with longer periods, higher noise levels, lower sampling rates and more complex shapes of the time series.
arXiv Detail & Related papers (2025-01-01T16:36:21Z) - Recurrent Neural Goodness-of-Fit Test for Time Series [8.22915954499148]
Time series data are crucial across diverse domains such as finance and healthcare.<n>Traditional evaluation metrics fall short due to the temporal dependencies and potential high dimensionality of the features.<n>We propose the REcurrent NeurAL (RENAL) Goodness-of-Fit test, a novel and statistically rigorous framework for evaluating generative time series models.
arXiv Detail & Related papers (2024-10-17T19:32:25Z) - EBES: Easy Benchmarking for Event Sequences [17.277513178760348]
Event Sequences (EvS) refer to sequential data characterized by irregular sampling intervals and a mix of categorical and numerical features.<n>EBES is a comprehensive benchmark for EvS classification with sequence-level targets.<n>It features standardized evaluation scenarios and protocols, along with an open-source PyTorch library that implements 9 modern models.
arXiv Detail & Related papers (2024-10-04T13:03:43Z) - Seq-to-Final: A Benchmark for Tuning from Sequential Distributions to a Final Time Point [18.843395348612553]
Leveraging historical data is necessary to learn a model for the last time point when limited data is available in the final period.
We construct a benchmark with different sequences of synthetic shifts to evaluate the effectiveness of 3 classes of methods.
Our results suggest that, for the sequences in our benchmark, methods that disregard the sequential structure and adapt to the final time point tend to perform well.
arXiv Detail & Related papers (2024-07-12T19:03:42Z) - TSI-Bench: Benchmarking Time Series Imputation [52.27004336123575]
TSI-Bench is a comprehensive benchmark suite for time series imputation utilizing deep learning techniques.
The TSI-Bench pipeline standardizes experimental settings to enable fair evaluation of imputation algorithms.
TSI-Bench innovatively provides a systematic paradigm to tailor time series forecasting algorithms for imputation purposes.
arXiv Detail & Related papers (2024-06-18T16:07:33Z) - Evaluating DTW Measures via a Synthesis Framework for Time-Series Data [3.4437947384641037]
Time-series data originate from various applications that describe specific observations or quantities of interest over time.
Dynamic Time Warping (DTW) is the standard approach to achieve an optimal alignment between two temporal signals.
Most DTW measures perform well on certain types of time-series data without a clear explanation of the reason.
This is the first time such a guideline is presented for selecting a proper DTW measure.
arXiv Detail & Related papers (2024-02-14T05:08:47Z) - TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling [67.02157180089573]
Time series pre-training has recently garnered wide attention for its potential to reduce labeling expenses and benefit various downstream tasks.
This paper proposes TimeSiam as a simple but effective self-supervised pre-training framework for Time series based on Siamese networks.
arXiv Detail & Related papers (2024-02-04T13:10:51Z) - OrionBench: Benchmarking Time Series Generative Models in the Service of the End-User [8.05635934199494]
OrionBench is a continuous benchmarking framework for unsupervised time series anomaly detection models.
We show how to use OrionBench, and the performance of pipelines across 17 releases published over the course of four years.
arXiv Detail & Related papers (2023-10-26T19:43:16Z) - Exogenous Data in Forecasting: FARM -- A New Measure for Relevance
Evaluation [62.997667081978825]
We introduce a new approach named FARM - Forward Relevance Aligned Metric.
Our forward method relies on an angular measure that compares changes in subsequent data points to align time-warped series.
As a first validation step, we present the application of our FARM approach to synthetic but representative signals.
arXiv Detail & Related papers (2023-04-21T15:22:33Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data.
Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step.
When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.