PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines
- URL: http://arxiv.org/abs/2407.08418v2
- Date: Fri, 12 Jul 2024 02:55:16 GMT
- Title: PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines
- Authors: ZiDong Wang, Zeyu Lu, Di Huang, Tong He, Xihui Liu, Wanli Ouyang, Lei Bai,
- Abstract summary: We introduce PredBench, a benchmark tailored for the holistic evaluation of prediction-temporal networks.
This benchmark integrates 12 widely adopted methods with diverse datasets across multiple application domains.
Its multi-dimensional evaluation framework broadens the analysis with a comprehensive set of metrics.
- Score: 86.36060279469304
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce PredBench, a benchmark tailored for the holistic evaluation of spatio-temporal prediction networks. Despite significant progress in this field, there remains a lack of a standardized framework for a detailed and comparative analysis of various prediction network architectures. PredBench addresses this gap by conducting large-scale experiments, upholding standardized and appropriate experimental settings, and implementing multi-dimensional evaluations. This benchmark integrates 12 widely adopted methods with 15 diverse datasets across multiple application domains, offering extensive evaluation of contemporary spatio-temporal prediction networks. Through meticulous calibration of prediction settings across various applications, PredBench ensures evaluations relevant to their intended use and enables fair comparisons. Moreover, its multi-dimensional evaluation framework broadens the analysis with a comprehensive set of metrics, providing deep insights into the capabilities of models. The findings from our research offer strategic directions for future developments in the field. Our codebase is available at https://github.com/OpenEarthLab/PredBench.
Related papers
- GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation [90.53485251837235]
GIFT-Eval is a pioneering benchmark aimed at promoting evaluation across diverse datasets.
GIFT-Eval encompasses 28 datasets over 144,000 time series and 177 million data points.
We also provide a non-leaking pretraining dataset containing approximately 230 billion data points.
arXiv Detail & Related papers (2024-10-14T11:29:38Z) - MIBench: A Comprehensive Benchmark for Model Inversion Attack and Defense [43.71365087852274]
Model Inversion (MI) attacks aim at leveraging the output information of target models to reconstruct privacy-sensitive training data.
The lack of a comprehensive, aligned, and reliable benchmark has emerged as a formidable challenge.
We introduce the first practical benchmark for model inversion attacks and defenses to address this critical gap, which is named textitMIBench
arXiv Detail & Related papers (2024-10-07T16:13:49Z) - JANET: Joint Adaptive predictioN-region Estimation for Time-series [28.19630729432862]
We propose JANET (Joint Adaptive predictioN-region Estimation for Time-series), a novel framework for constructing conformal prediction regions.
JANET generalises the inductive conformal framework and efficiently produces joint prediction regions with controlled K-familywise error rates.
Our empirical evaluation demonstrates JANET's superior performance in multi-step prediction tasks across diverse time series datasets.
arXiv Detail & Related papers (2024-07-08T21:03:15Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - Bayesian Online Learning for Consensus Prediction [16.890828000688174]
We propose a family of methods that dynamically estimate expert consensus from partial feedback.
We demonstrate the efficacy of our framework against a variety of baselines on CIFAR-10H and ImageNet-16H.
arXiv Detail & Related papers (2023-12-12T19:18:04Z) - Conformal Prediction in Multi-User Settings: An Evaluation [0.10231119246773925]
Machine learning models are trained and evaluated without making any distinction between users.
This produces inaccurate performance metrics estimates in multi-user settings.
In this work we evaluated the conformal prediction framework in several multi-user settings.
arXiv Detail & Related papers (2023-12-08T17:33:23Z) - Regions of Reliability in the Evaluation of Multivariate Probabilistic
Forecasts [73.33395097728128]
We provide the first systematic finite-sample study of proper scoring rules for time-series forecasting evaluation.
We carry out our analysis on a comprehensive synthetic benchmark, specifically designed to test several key discrepancies between ground-truth and forecast distributions.
arXiv Detail & Related papers (2023-04-19T17:38:42Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data.
Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step.
When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.