Forecast Evaluation for Data Scientists: Common Pitfalls and Best
Practices
- URL: http://arxiv.org/abs/2203.10716v1
- Date: Mon, 21 Mar 2022 03:24:46 GMT
- Title: Forecast Evaluation for Data Scientists: Common Pitfalls and Best
Practices
- Authors: Hansika Hewamalage, Klaus Ackermann, Christoph Bergmeir
- Abstract summary: We provide a tutorial-like compilation of the details of one of the most important steps in the overall forecasting process, namely the evaluation.
We elaborate on the different problematic characteristics of time series such as non-normalities and non-stationarities.
Best practices in forecast evaluation are outlined with respect to the different steps such as data partitioning, error calculation, statistical testing, and others.
- Score: 4.2951168699706646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine Learning (ML) and Deep Learning (DL) methods are increasingly
replacing traditional methods in many domains involved with important decision
making activities. DL techniques tailor-made for specific tasks such as image
recognition, signal processing, or speech analysis are being introduced at a
fast pace with many improvements. However, for the domain of forecasting, the
current state in the ML community is perhaps where other domains such as
Natural Language Processing and Computer Vision were at several years ago. The
field of forecasting has mainly been fostered by statisticians/econometricians;
consequently the related concepts are not the mainstream knowledge among
general ML practitioners. The different non-stationarities associated with time
series challenge the data-driven ML models. Nevertheless, recent trends in the
domain have shown that with the availability of massive amounts of time series,
ML techniques are quite competent in forecasting, when related pitfalls are
properly handled. Therefore, in this work we provide a tutorial-like
compilation of the details of one of the most important steps in the overall
forecasting process, namely the evaluation. This way, we intend to impart the
information of forecast evaluation to fit the context of ML, as means of
bridging the knowledge gap between traditional methods of forecasting and
state-of-the-art ML techniques. We elaborate on the different problematic
characteristics of time series such as non-normalities and non-stationarities
and how they are associated with common pitfalls in forecast evaluation. Best
practices in forecast evaluation are outlined with respect to the different
steps such as data partitioning, error calculation, statistical testing, and
others. Further guidelines are also provided along selecting valid and suitable
error measures depending on the specific characteristics of the dataset at
hand.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - XForecast: Evaluating Natural Language Explanations for Time Series Forecasting [72.57427992446698]
Time series forecasting aids decision-making, especially for stakeholders who rely on accurate predictions.
Traditional explainable AI (XAI) methods, which underline feature or temporal importance, often require expert knowledge.
evaluating forecast NLEs is difficult due to the complex causal relationships in time series data.
arXiv Detail & Related papers (2024-10-18T05:16:39Z) - Worst-Case Convergence Time of ML Algorithms via Extreme Value Theory [8.540426791244533]
This paper leverages the statistics of extreme values to predict the worst-case convergence times of machine learning algorithms.
Timing is a critical non-functional property of ML systems, and providing the worst-case converge times is essential to guarantee the availability of ML and its services.
arXiv Detail & Related papers (2024-04-10T17:05:12Z) - Machine Learning Algorithms for Time Series Analysis and Forecasting [0.0]
Time series data is being used everywhere, from sales records to patients' health evolution metrics.
Various statistical and deep learning models have been considered, notably, ARIMA, Prophet and LSTMs.
Our work can be used by anyone to develop a good understanding of the forecasting process, and to identify various state of the art models which are being used today.
arXiv Detail & Related papers (2022-11-25T22:12:03Z) - On Generalizing Beyond Domains in Cross-Domain Continual Learning [91.56748415975683]
Deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task.
Our proposed approach learns new tasks under domain shift with accuracy boosts up to 10% on challenging datasets such as DomainNet and OfficeHome.
arXiv Detail & Related papers (2022-03-08T09:57:48Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z) - Uncertainty Prediction for Machine Learning Models of Material
Properties [0.0]
Uncertainty in AI-based predictions of material properties is of immense importance for the success and reliability of AI applications in material science.
We compare 3 different approaches to obtain such individual uncertainty, testing them on 12 ML-physical properties.
arXiv Detail & Related papers (2021-07-16T16:33:55Z) - Quantifying Uncertainty in Deep Spatiotemporal Forecasting [67.77102283276409]
We describe two types of forecasting problems: regular grid-based and graph-based.
We analyze UQ methods from both the Bayesian and the frequentist point view, casting in a unified framework via statistical decision theory.
Through extensive experiments on real-world road network traffic, epidemics, and air quality forecasting tasks, we reveal the statistical computational trade-offs for different UQ methods.
arXiv Detail & Related papers (2021-05-25T14:35:46Z) - Spatiotemporal Attention for Multivariate Time Series Prediction and
Interpretation [17.568599402858037]
temporal attention mechanism (STAM) for simultaneous learning of the most important time steps and variables.
Results: STAM maintains state-of-the-art prediction accuracy while offering the benefit of accurate interpretability.
arXiv Detail & Related papers (2020-08-11T17:34:55Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.