Hierarchical Evaluation Function: A Multi-Metric Approach for Optimizing Demand Forecasting Models
- URL: http://arxiv.org/abs/2508.13057v5
- Date: Wed, 15 Oct 2025 14:39:52 GMT
- Title: Hierarchical Evaluation Function: A Multi-Metric Approach for Optimizing Demand Forecasting Models
- Authors: Adolfo González, Víctor Parada,
- Abstract summary: The Hierarchical Evaluation Function (HEF) is proposed as a multi-metric framework for hyperparameter optimization.<n>HEF integrates explanatory power (R2), sensitivity to extreme errors (RMSE), and average accuracy (MAE)<n>The performance of HEF was assessed using four widely recognized benchmark datasets in the forecasting domain.
- Score: 0.479839492673697
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Demand forecasting in competitive and uncertain business environments requires models that can integrate multiple evaluation perspectives, rather than being restricted to hyperparameter optimization through a single metric. This traditional approach tends to prioritize one error indicator, which can bias results when metrics provide contradictory signals. In this context, the Hierarchical Evaluation Function (HEF) is proposed as a multi-metric framework for hyperparameter optimization that integrates explanatory power (R2), sensitivity to extreme errors (RMSE), and average accuracy (MAE). The performance of HEF was assessed using four widely recognized benchmark datasets in the forecasting domain: the Walmart, M3, M4, and M5 datasets. Prediction models were optimized through Grid Search, Particle Swarm Optimization (PSO), and Optuna, and statistical analyses based on difference-of-proportions tests confirmed that HEF delivers superior results compared to a unimetric reference function, regardless of the optimizer employed, with particular relevance for heterogeneous monthly time series (M3) and highly granular daily demand scenarios (M5). The findings demonstrate that HEF improves stability, generalization, and robustness at a low computational cost, consolidating its role as a reliable evaluation framework that enhances model selection, enables more accurate demand forecasts, and supports decision-making in dynamic and competitive business environments.
Related papers
- Echo State Networks for Time Series Forecasting: Hyperparameter Sweep and Benchmarking [51.56484100374058]
We evaluate whether a fully automatic, purely feedback-driven ESN can serve as a competitive alternative to widely used statistical forecasting methods.<n>Forecast accuracy is measured using MASE and sMAPE and benchmarked against simple benchmarks like drift and seasonal naive and statistical models.
arXiv Detail & Related papers (2026-02-03T16:01:22Z) - AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards [60.2998874976509]
We propose advantage-weighted policy optimization (AWPO) to integrate explicit reasoning rewards to enhance tool-use capability.<n>AWPO incorporates variance-aware gating and difficulty-aware weighting to adaptively modulate advantages from reasoning signals.<n>Experiments demonstrate that AWPO achieves state-of-the-art performance across standard tool-use benchmarks.
arXiv Detail & Related papers (2025-12-22T08:07:00Z) - Km-scale dynamical downscaling through conformalized latent diffusion models [45.94979929172337]
Dynamical downscaling is crucial for deriving high-resolution meteorological fields from coarse-scale simulations.<n>Generative Diffusion models (DMs) have recently emerged as powerful data-driven tools for this task.<n>However, DMs lack finite-sample guarantees against overconfident predictions, resulting in miscalibrated grid-point-level uncertainty estimates.<n>We tackle this issue by augmenting the downscaling pipeline with a conformal prediction framework.
arXiv Detail & Related papers (2025-10-15T08:41:36Z) - CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling [60.55856973678002]
Large Reasoning Models (LRMs) have demonstrated strong capabilities in complex multi-step reasoning.<n>Existing domain adaptation methods, originally designed for earlier instruction-tuned models, often fail to exploit the advanced reasoning patterns of modern LRMs.<n>We propose textbfCALM, a framework that progressively refines LRMs within their native reasoning modes for optimization modeling tasks.
arXiv Detail & Related papers (2025-10-05T13:38:31Z) - Locally Adaptive Conformal Inference for Operator Models [5.78532405664684]
We introduce Local Sliced Conformal Inference (LSCI), a distribution-free framework for generating function-valued locally adaptive prediction sets for operator models.<n>We prove finite-sample validity and derive a data-dependent upper bound on the coverage gap under local exchangeability.<n>We empirically demonstrate spaces against biased predictions and certain out-of-distribution noise regimes.
arXiv Detail & Related papers (2025-07-28T16:37:56Z) - Divergence Minimization Preference Optimization for Diffusion Model Alignment [58.651951388346525]
Divergence Minimization Preference Optimization (DMPO) is a principled method for aligning diffusion models by minimizing reverse KL divergence.<n>Our results show that diffusion models fine-tuned with DMPO can consistently outperform or match existing techniques.<n>DMPO unlocks a robust and elegant pathway for preference alignment, bridging principled theory with practical performance in diffusion models.
arXiv Detail & Related papers (2025-07-10T07:57:30Z) - Can Time-Series Foundation Models Perform Building Energy Management Tasks? [5.450531952940644]
Building energy management tasks require processing and learning from a variety of time-series data.<n>Existing solutions rely on bespoke task- and data-specific models to perform these tasks.<n>Inspired by the transformative success of Large Language Models (LLMs), Time-Series Foundation Models (TSFMs) have the potential to change this.
arXiv Detail & Related papers (2025-06-12T19:45:10Z) - Feasible Learning [78.6167929413604]
We introduce Feasible Learning (FL), a sample-centric learning paradigm where models are trained by solving a feasibility problem that bounds the loss for each training sample.<n>Our empirical analysis, spanning image classification, age regression, and preference optimization in large language models, demonstrates that models trained via FL can learn from data while displaying improved tail behavior compared to ERM, with only a marginal impact on average performance.
arXiv Detail & Related papers (2025-01-24T20:39:38Z) - QGAPHEnsemble : Combining Hybrid QLSTM Network Ensemble via Adaptive Weighting for Short Term Weather Forecasting [0.0]
This research highlights the practical efficacy of employing advanced machine learning techniques.<n>Our model demonstrates a substantial improvement in the accuracy and reliability of meteorological predictions.<n>The paper highlights the importance of optimized ensemble techniques to improve the performance the given weather forecasting task.
arXiv Detail & Related papers (2025-01-18T20:18:48Z) - Local vs. Global Models for Hierarchical Forecasting [0.0]
This study explores the influence of distinct information utilisation on the accuracy of hierarchical forecasts.
We develop Global Forecasting Models (GFMs) to exploit cross-series and cross-hierarchies information.
Two specific GFMs based on LightGBM are introduced, demonstrating superior accuracy and lower model complexity.
arXiv Detail & Related papers (2024-11-10T08:51:49Z) - MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models.
We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z) - On ADMM in Heterogeneous Federated Learning: Personalization, Robustness, and Fairness [16.595935469099306]
We propose FLAME, an optimization framework by utilizing the alternating direction method of multipliers (ADMM) to train personalized and global models.
Our theoretical analysis establishes the global convergence and two kinds of convergence rates for FLAME under mild assumptions.
Our experimental findings show that FLAME outperforms state-of-the-art methods in convergence and accuracy, and it achieves higher test accuracy under various attacks.
arXiv Detail & Related papers (2024-07-23T11:35:42Z) - Variational Inference of Parameters in Opinion Dynamics Models [9.51311391391997]
This work uses variational inference to estimate the parameters of an opinion dynamics ABM.
We transform the inference process into an optimization problem suitable for automatic differentiation.
Our approach estimates both macroscopic (bounded confidence intervals and backfire thresholds) and microscopic ($200$ categorical, agent-level roles) more accurately than simulation-based and MCMC methods.
arXiv Detail & Related papers (2024-03-08T14:45:18Z) - ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast [57.6987191099507]
We introduce Exloss, a novel loss function that performs asymmetric optimization and highlights extreme values to obtain accurate extreme weather forecast.
We also introduce ExBooster, which captures the uncertainty in prediction outcomes by employing multiple random samples.
Our solution can achieve state-of-the-art performance in extreme weather prediction, while maintaining the overall forecast accuracy comparable to the top medium-range forecast models.
arXiv Detail & Related papers (2024-02-02T10:34:13Z) - Comparative Evaluation of Metaheuristic Algorithms for Hyperparameter
Selection in Short-Term Weather Forecasting [0.0]
This paper explores the application of metaheuristic algorithms, namely Genetic Algorithm (GA), Differential Evolution (DE) and Particle Swarm Optimization (PSO)
We evaluate their performance in weather forecasting based on metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE)
arXiv Detail & Related papers (2023-09-05T22:13:35Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - Rectified Max-Value Entropy Search for Bayesian Optimization [54.26984662139516]
We develop a rectified MES acquisition function based on the notion of mutual information.
As a result, RMES shows a consistent improvement over MES in several synthetic function benchmarks and real-world optimization problems.
arXiv Detail & Related papers (2022-02-28T08:11:02Z) - Providing reliability in Recommender Systems through Bernoulli Matrix
Factorization [63.732639864601914]
This paper proposes Bernoulli Matrix Factorization (BeMF) to provide both prediction values and reliability values.
BeMF acts on model-based collaborative filtering rather than on memory-based filtering.
The more reliable a prediction is, the less liable it is to be wrong.
arXiv Detail & Related papers (2020-06-05T14:24:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.