Evaluating Time Series Models with Knowledge Discovery
- URL: http://arxiv.org/abs/2503.14869v1
- Date: Wed, 19 Mar 2025 03:48:56 GMT
- Title: Evaluating Time Series Models with Knowledge Discovery
- Authors: Li Zhang,
- Abstract summary: Time series data is one of the most ubiquitous data modalities existing in a diverse critical domains such as healthcare, seismology, manufacturing and energy.<n>Models performance often evaluate by certain evaluation metrics such as RMSE, Accuracy, and F1-score.<n>We propose a knowledge-discovery-based evaluation framework, which aims to effectively utilize domain-expertise knowledge to evaluate a model.
- Score: 4.897267974042842
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Time series data is one of the most ubiquitous data modalities existing in a diverse critical domains such as healthcare, seismology, manufacturing and energy. Recent years, there are increasing interest of the data mining community to develop time series deep learning models to pursue better performance. The models performance often evaluate by certain evaluation metrics such as RMSE, Accuracy, and F1-score. Yet time series data are often hard to interpret and are collected with unknown environmental factors, sensor configuration, latent physic mechanisms, and non-stationary evolving behavior. As a result, a model that is better on standard metric-based evaluation may not always perform better in real-world tasks. In this blue sky paper, we aim to explore the challenge that exists in the metric-based evaluation framework for time series data mining and propose a potential blue-sky idea -- developing a knowledge-discovery-based evaluation framework, which aims to effectively utilize domain-expertise knowledge to evaluate a model. We demonstrate that an evidence-seeking explanation can potentially have stronger persuasive power than metric-based evaluation and obtain better generalization ability for time series data mining tasks.
Related papers
- Deep Learning For Time Series Analysis With Application On Human Motion [0.0]
This thesis leverages deep learning to enhance classification with feature engineering, introduce foundation models, and develop a compact yet state-of-the-art architecture.
Our contributions apply to real-world tasks, including human motion analysis for action recognition and rehabilitation.
For prototyping, we propose a shape-based synthetic sample generation method to support regression models when data is scarce.
arXiv Detail & Related papers (2025-02-26T18:01:51Z) - TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting [59.702504386429126]
TimeRAF is a Retrieval-Augmented Forecasting model that enhance zero-shot time series forecasting through retrieval-augmented techniques.<n>TimeRAF employs an end-to-end learnable retriever to extract valuable information from the knowledge base.
arXiv Detail & Related papers (2024-12-30T09:06:47Z) - Recurrent Neural Goodness-of-Fit Test for Time Series [8.22915954499148]
Time series data are crucial across diverse domains such as finance and healthcare.<n>Traditional evaluation metrics fall short due to the temporal dependencies and potential high dimensionality of the features.<n>We propose the REcurrent NeurAL (RENAL) Goodness-of-Fit test, a novel and statistically rigorous framework for evaluating generative time series models.
arXiv Detail & Related papers (2024-10-17T19:32:25Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Large Models for Time Series and Spatio-Temporal Data: A Survey and
Outlook [95.32949323258251]
Temporal data, notably time series andtemporal-temporal data, are prevalent in real-world applications.
Recent advances in large language and other foundational models have spurred increased use in time series andtemporal data mining.
arXiv Detail & Related papers (2023-10-16T09:06:00Z) - MADS: Modulated Auto-Decoding SIREN for time series imputation [9.673093148930874]
We propose MADS, a novel auto-decoding framework for time series imputation, built upon implicit neural representations.
We evaluate our model on two real-world datasets, and show that it outperforms state-of-the-art methods for time series imputation.
arXiv Detail & Related papers (2023-07-03T09:08:47Z) - Exploring validation metrics for offline model-based optimisation with
diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle.
While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples.
This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z) - Quantifying Quality of Class-Conditional Generative Models in
Time-Series Domain [4.219228636765818]
We introduce the InceptionTime Score (ITS) and the Frechet InceptionTime Distance (FITD) to gauge the qualitative performance of class conditional generative models on the time-series domain.
We conduct extensive experiments on 80 different datasets to study the discriminative capabilities of proposed metrics.
arXiv Detail & Related papers (2022-10-14T08:13:20Z) - A Study on the Evaluation of Generative Models [19.18642459565609]
Implicit generative models, which do not return likelihood values, have become prevalent in recent years.
In this work, we study the evaluation metrics of generative models by generating a high-quality synthetic dataset.
Our study shows that while FID and IS do correlate to several f-divergences, their ranking of close models can vary considerably.
arXiv Detail & Related papers (2022-06-22T09:27:31Z) - Leveraging the structure of dynamical systems for data-driven modeling [111.45324708884813]
We consider the impact of the training set and its structure on the quality of the long-term prediction.
We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models.
arXiv Detail & Related papers (2021-12-15T20:09:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.