Related papers: TSGBench: Time Series Generation Benchmark

TSGBench: Time Series Generation Benchmark

URL: http://arxiv.org/abs/2309.03755v2
Date: Thu, 7 Dec 2023 13:42:53 GMT
Title: TSGBench: Time Series Generation Benchmark
Authors: Yihao Ang, Qiang Huang, Yifan Bao, Anthony K. H. Tung, Zhiyong Huang
Abstract summary: textsfTSGBench is a unified and comprehensive assessment of Synthetic Time Series Generation methods. It comprises three modules: (1) a curated collection of publicly available, real-world datasets tailored for TSG, together with a standardized preprocessing pipeline; (2) a comprehensive evaluation measures suite including vanilla measures, new distance-based assessments, and visualization tools; and (3) a pioneering generalization test rooted in Domain Adaptation (DA) We have conducted experiments using textsfTSGBench across a spectrum of ten real-world datasets from diverse domains, utilizing ten advanced TSG methods and twelve evaluation measures.
Score: 11.199605025284185
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Synthetic Time Series Generation (TSG) is crucial in a range of applications, including data augmentation, anomaly detection, and privacy preservation. Although significant strides have been made in this field, existing methods exhibit three key limitations: (1) They often benchmark against similar model types, constraining a holistic view of performance capabilities. (2) The use of specialized synthetic and private datasets introduces biases and hampers generalizability. (3) Ambiguous evaluation measures, often tied to custom networks or downstream tasks, hinder consistent and fair comparison. To overcome these limitations, we introduce \textsf{TSGBench}, the inaugural Time Series Generation Benchmark, designed for a unified and comprehensive assessment of TSG methods. It comprises three modules: (1) a curated collection of publicly available, real-world datasets tailored for TSG, together with a standardized preprocessing pipeline; (2) a comprehensive evaluation measures suite including vanilla measures, new distance-based assessments, and visualization tools; (3) a pioneering generalization test rooted in Domain Adaptation (DA), compatible with all methods. We have conducted comprehensive experiments using \textsf{TSGBench} across a spectrum of ten real-world datasets from diverse domains, utilizing ten advanced TSG methods and twelve evaluation measures. The results highlight the reliability and efficacy of \textsf{TSGBench} in evaluating TSG methods. Crucially, \textsf{TSGBench} delivers a statistical analysis of the performance rankings of these methods, illuminating their varying performance across different datasets and measures and offering nuanced insights into the effectiveness of each method.

Related papers

FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting [58.70072722290475]
Financial time series (FinTS) record the behavior of human-brain-augmented decision-making. FinTSB is a comprehensive and practical benchmark for financial time series forecasting.
arXiv Detail & Related papers (2025-02-26T05:19:16Z)
CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for Retrieval-Augmented Generation with Enhanced Data Diversity [23.48167670445722]
Retrieval-Augmented Generation (RAG) aims to generate more accurate and reliable answers with the help of the retrieved context from external knowledge sources. evaluating these systems remains a crucial research area due to the following issues. We propose a Comprehensive Full-chain Evaluation (CoFE-RAG) framework to facilitate thorough evaluation across the entire RAG pipeline.
arXiv Detail & Related papers (2024-10-16T05:20:32Z)
EBES: Easy Benchmarking for Event Sequences [17.277513178760348]
Event sequences are common data structures in various real-world domains such as healthcare, finance, and user interaction logs. Despite advances in temporal data modeling techniques, there is no standardized benchmarks for evaluating their performance on event sequences. We introduce EBES, a comprehensive benchmarking tool with standardized evaluation scenarios and protocols.
arXiv Detail & Related papers (2024-10-04T13:03:43Z)
Constructing Confidence Intervals for 'the' Generalization Error -- a Comprehensive Benchmark Study [7.094603504956301]
In machine learning, confidence intervals (CIs) for the generalization error are a crucial tool. We evaluate 13 different CI methods on a total of 19 regression and classification problems, using seven different inducers and a total of eight loss functions. We give an overview of the methodological foundations and inherent challenges of constructing CIs for the generalization error and provide a concise review of all 13 methods in a unified framework.
arXiv Detail & Related papers (2024-09-27T15:29:32Z)
UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation [66.05528698010697]
Test-Time Adaptation aims to adapt pre-trained models to the target domain during testing. Researchers have identified various challenging scenarios and developed diverse methods to address these challenges. We propose a Unified Test-Time Adaptation benchmark, which is comprehensive and widely applicable.
arXiv Detail & Related papers (2024-07-29T15:04:53Z)
On the Evaluation Consistency of Attribution-based Explanations [42.1421504321572]
We introduce Meta-Rank, an open platform for benchmarking attribution methods in the image domain. Our benchmark reveals three insights in attribution evaluation endeavors: 1) evaluating attribution methods under disparate settings can yield divergent performance rankings; 2) although inconsistent across numerous cases, the performance rankings exhibit remarkable consistency across distinct checkpoints along the same training trajectory; and 3) prior attempts at consistent evaluation fare no better than baselines when extended to more heterogeneous models and datasets.
arXiv Detail & Related papers (2024-07-28T11:49:06Z)
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z)
TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods [27.473935782550388]
Time series are generated in diverse domains such as economic, traffic, health, and energy. We propose TFB, an automated benchmark for Time Series Forecasting (TSF) methods.
arXiv Detail & Related papers (2024-03-29T12:37:57Z)
Test-Time Domain Generalization for Face Anti-Spoofing [60.94384914275116]
Face Anti-Spoofing (FAS) is pivotal in safeguarding facial recognition systems against presentation attacks. We introduce a novel Test-Time Domain Generalization framework for FAS, which leverages the testing data to boost the model's generalizability. Our method, consisting of Test-Time Style Projection (TTSP) and Diverse Style Shifts Simulation (DSSS), effectively projects the unseen data to the seen domain space.
arXiv Detail & Related papers (2024-03-28T11:50:23Z)
DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection [55.70982767084996]
A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark. We present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions. DeepfakeBench contains 15 state-of-the-art detection methods, 9CL datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations.
arXiv Detail & Related papers (2023-07-04T01:34:41Z)
News Summarization and Evaluation in the Era of GPT-3 [73.48220043216087]
We study how GPT-3 compares against fine-tuned models trained on large summarization datasets. We show that not only do humans overwhelmingly prefer GPT-3 summaries, prompted using only a task description, but these also do not suffer from common dataset-specific issues such as poor factuality.
arXiv Detail & Related papers (2022-09-26T01:04:52Z)
WRENCH: A Comprehensive Benchmark for Weak Supervision [66.82046201714766]
benchmark consists of 22 varied real-world datasets for classification and sequence tagging. We use benchmark to conduct extensive comparisons over more than 100 method variants to demonstrate its efficacy as a benchmark platform.
arXiv Detail & Related papers (2021-09-23T13:47:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.