Related papers: TSRating: Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment

TSRating: Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment

URL: http://arxiv.org/abs/2506.01290v1
Date: Mon, 02 Jun 2025 03:52:55 GMT
Title: TSRating: Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment
Authors: Shunyu Wu, Dan Li, Haozheng Ye, Zhuomin Chen, Jiahui Zhou, Jian Lou, Zibin Zheng, See-Kiong Ng,
Abstract summary: We propose TSRating, a framework for rating the quality of time series data crawled from diverse domains.<n>We show that TSRating outperforms baselines in terms of estimation accuracy, efficiency, and domain adaptability.
Score: 35.012553346034395
License: http://creativecommons.org/licenses/by/4.0/
Abstract: High-quality time series (TS) data are essential for ensuring TS model performance, rendering research on rating TS data quality indispensable. Existing methods have shown promising rating accuracy within individual domains, primarily by extending data quality rating techniques such as influence functions and Shapley values to account for temporal characteristics. However, they neglect the fact that real-world TS data can span vastly different domains and exhibit distinct properties, hampering the accurate and efficient rating of diverse TS data. In this paper, we propose TSRating, a novel and unified framework for rating the quality of time series data crawled from diverse domains. TSRating is built on the assumption that LLMs inherit ample knowledge, acquired during their extensive pretraining, enabling them to comprehend and discern quality differences in diverse TS data. We verify this assumption by devising a series of prompts to elicit quality comparisons from LLMs for pairs of TS samples. We then fit a dedicated rating model, termed TSRater, to convert the LLMs' judgments into efficient quality predictions via TSRater's inference on future TS samples. To ensure cross-domain adaptability, we develop a meta-learning scheme to train TSRater on quality comparisons collected from nine distinct domains. To improve training efficiency, we employ signSGD for inner-loop updates, thus circumventing the demanding computation of hypergradients. Extensive experimental results on eleven benchmark datasets across three time series tasks, each using both conventional TS models and TS foundation models, demonstrate that TSRating outperforms baselines in terms of estimation accuracy, efficiency, and domain adaptability.

Related papers

TS-Inverse: A Gradient Inversion Attack Tailored for Federated Time Series Forecasting Models [3.7324240104250728]
Federated learning enables clients with privacy-sensitive time series data to collaboratively learn forecasting models.<n>Servers can reconstruct clients' training data through gradient inversion attacks (GIA)<n>GIA is demonstrated for image classification tasks, little is known about time series regression tasks.<n>We propose TS-Inverse, a novel GIA that improves the inversion of TS data by learning a gradient inversion model.
arXiv Detail & Related papers (2025-03-26T19:35:49Z)
Call for Rigor in Reporting Quality of Instruction Tuning Data [7.284192559306471]
Studies emphasize the significance of the quality of instruction tuning (IT) data.<n>We demonstrate the potential problems arising from this practice and emphasize the need for careful consideration in verifying data quality.
arXiv Detail & Related papers (2025-03-04T02:04:58Z)
Evaluating Time Series Foundation Models on Noisy Periodic Time Series [0.0]
This paper presents an empirical study evaluating the performance of time series foundation models (TSFMs) over two datasets constituting noisy periodic time series.<n>Our findings demonstrate that while for time series with bounded periods, TSFMs can match or outperform the statistical approaches, their forecasting abilities deteriorate with longer periods, higher noise levels, lower sampling rates and more complex shapes of the time series.
arXiv Detail & Related papers (2025-01-01T16:36:21Z)
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets. The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method. The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z)
Bridging SFT and DPO for Diffusion Model Alignment with Self-Sampling Preference Optimization [67.8738082040299]
Self-Sampling Preference Optimization (SSPO) is a new alignment method for post-training reinforcement learning.<n>SSPO eliminates the need for paired data and reward models while retaining the training stability of SFT.<n>SSPO surpasses all previous approaches on the text-to-image benchmarks and demonstrates outstanding performance on the text-to-video benchmarks.
arXiv Detail & Related papers (2024-10-07T17:56:53Z)
ReAugment: Model Zoo-Guided RL for Few-Shot Time Series Augmentation and Forecasting [74.00765474305288]
We present a pilot study on using reinforcement learning (RL) for time series data augmentation.<n>Our method, ReAugment, tackles three critical questions: which parts of the training set should be augmented, how the augmentation should be performed, and what advantages RL brings to the process.
arXiv Detail & Related papers (2024-09-10T07:34:19Z)
TSI-Bench: Benchmarking Time Series Imputation [52.27004336123575]
TSI-Bench is a comprehensive benchmark suite for time series imputation utilizing deep learning techniques. The TSI-Bench pipeline standardizes experimental settings to enable fair evaluation of imputation algorithms. TSI-Bench innovatively provides a systematic paradigm to tailor time series forecasting algorithms for imputation purposes.
arXiv Detail & Related papers (2024-06-18T16:07:33Z)
Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis [70.78170766633039]
We address the need for means of assessing MTS forecasting proposals reliably and fairly. BasicTS+ is a benchmark designed to enable fair, comprehensive, and reproducible comparison of MTS forecasting solutions. We apply BasicTS+ along with rich datasets to assess the capabilities of more than 45 MTS forecasting solutions.
arXiv Detail & Related papers (2023-10-09T19:52:22Z)
Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z)
Quantifying Quality of Class-Conditional Generative Models in Time-Series Domain [4.219228636765818]
We introduce the InceptionTime Score (ITS) and the Frechet InceptionTime Distance (FITD) to gauge the qualitative performance of class conditional generative models on the time-series domain. We conduct extensive experiments on 80 different datasets to study the discriminative capabilities of proposed metrics.
arXiv Detail & Related papers (2022-10-14T08:13:20Z)
Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples. Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z)
Investigating Text Simplification Evaluation [21.128143745540292]
Modern text simplification (TS) heavily relies on the availability of gold standard data to build machine learning models. Existing studies show that parallel TS corpora contain inaccurate simplifications and incorrect alignments. evaluation is usually performed by using metrics such as BLEU or SARI to compare system output to the gold standard.
arXiv Detail & Related papers (2021-07-28T22:49:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.