Scalable Probabilistic Forecasting in Retail with Gradient Boosted
Trees: A Practitioner's Approach
- URL: http://arxiv.org/abs/2311.00993v1
- Date: Thu, 2 Nov 2023 04:46:32 GMT
- Title: Scalable Probabilistic Forecasting in Retail with Gradient Boosted
Trees: A Practitioner's Approach
- Authors: Xueying Long, Quang Bui, Grady Oktavian, Daniel F. Schmidt, Christoph
Bergmeir, Rakshitha Godahewa, Seong Per Lee, Kaifeng Zhao, Paul Condylis
- Abstract summary: We propose a top-down approach to forecasting at an aggregated level with less amount of series and intermittency.
Direct training at the lower level with subsamples can also be an alternative way of scaling.
We are able to show the differences in characteristics of the e-commerce and brick-and-mortar retail datasets.
- Score: 4.672665650064167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent M5 competition has advanced the state-of-the-art in retail
forecasting. However, we notice important differences between the competition
challenge and the challenges we face in a large e-commerce company. The
datasets in our scenario are larger (hundreds of thousands of time series), and
e-commerce can afford to have a larger assortment than brick-and-mortar
retailers, leading to more intermittent data. To scale to larger dataset sizes
with feasible computational effort, firstly, we investigate a two-layer
hierarchy and propose a top-down approach to forecasting at an aggregated level
with less amount of series and intermittency, and then disaggregating to obtain
the decision-level forecasts. Probabilistic forecasts are generated under
distributional assumptions. Secondly, direct training at the lower level with
subsamples can also be an alternative way of scaling. Performance of modelling
with subsets is evaluated with the main dataset. Apart from a proprietary
dataset, the proposed scalable methods are evaluated using the Favorita dataset
and the M5 dataset. We are able to show the differences in characteristics of
the e-commerce and brick-and-mortar retail datasets. Notably, our top-down
forecasting framework enters the top 50 of the original M5 competition, even
with models trained at a higher level under a much simpler setting.
Related papers
- F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data [65.6499834212641]
We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm.
By considering domain similarities through task-specific metadata, our model improved generalization, where the excess risk decreases as the number of training tasks increases.
Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
arXiv Detail & Related papers (2024-06-23T21:28:50Z) - Task-customized Masked AutoEncoder via Mixture of Cluster-conditional
Experts [104.9871176044644]
Masked Autoencoder(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training.
We propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE)
MoCE trains each expert only with semantically relevant images by using cluster-conditional gates.
arXiv Detail & Related papers (2024-02-08T03:46:32Z) - Inverse Scaling: When Bigger Isn't Better [80.42834197416444]
Large language models (LMs) show predictable improvements to overall loss with increased scale.
We present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale.
arXiv Detail & Related papers (2023-06-15T20:11:23Z) - An Empirical Study on Distribution Shift Robustness From the Perspective
of Pre-Training and Data Augmentation [91.62129090006745]
This paper studies the distribution shift problem from the perspective of pre-training and data augmentation.
We provide the first comprehensive empirical study focusing on pre-training and data augmentation.
arXiv Detail & Related papers (2022-05-25T13:04:53Z) - Approaching sales forecasting using recurrent neural networks and
transformers [57.43518732385863]
We develop three alternatives to tackle the problem of forecasting the customer sales at day/store/item level using deep learning techniques.
Our empirical results show how good performance can be achieved by using a simple sequence to sequence architecture with minimal data preprocessing effort.
The proposed solution achieves a RMSLE of around 0.54, which is competitive with other more specific solutions to the problem proposed in the Kaggle competition.
arXiv Detail & Related papers (2022-04-16T12:03:52Z) - A Comparative Study on Forecasting of Retail Sales [0.0]
We benchmark forecasting models on historical sales data from Walmart to predict their future sales.
We apply these models on the forecasting challenge dataset (M5 forecasting by Kaggle)
arXiv Detail & Related papers (2022-03-14T04:24:29Z) - M5 Competition Uncertainty: Overdispersion, distributional forecasting,
GAMLSS and beyond [0.0]
We show that the M5 competition data faces strong overdispersion and sporadic demand, especially zero demand.
We discuss resulting modeling issues concerning adequate probabilistic forecasting of such count data processes.
arXiv Detail & Related papers (2021-07-14T13:05:55Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Hierarchical robust aggregation of sales forecasts at aggregated levels
in e-commerce, based on exponential smoothing and Holt's linear trend method [0.0]
We consider ensemble forecasts, given by several instances of classical techniques tuned with different (sets of) parameters.
We apply this methodology to a hierarchical data set of sales provided by the e-commerce company Cdiscount.
The performance is better than what would be obtained by optimally tuning the classical techniques on a train set.
arXiv Detail & Related papers (2020-06-05T11:20:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.