A Simulation Environment and Reinforcement Learning Method for Waste
Reduction
- URL: http://arxiv.org/abs/2205.15455v2
- Date: Fri, 26 May 2023 12:10:13 GMT
- Title: A Simulation Environment and Reinforcement Learning Method for Waste
Reduction
- Authors: Sami Jullien, Mozhdeh Ariannezhad, Paul Groth, Maarten de Rijke
- Abstract summary: We study the problem of restocking a grocery store's inventory with perishable items over time, from a distributional point of view.
The objective is to maximize sales while minimizing waste, with uncertainty about the actual consumption by costumers.
We frame inventory restocking as a new reinforcement learning task that exhibits behavior conditioned on the agent's actions.
- Score: 50.545552995521774
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In retail (e.g., grocery stores, apparel shops, online retailers), inventory
managers have to balance short-term risk (no items to sell) with long-term-risk
(over ordering leading to product waste). This balancing task is made
especially hard due to the lack of information about future customer purchases.
In this paper, we study the problem of restocking a grocery store's inventory
with perishable items over time, from a distributional point of view. The
objective is to maximize sales while minimizing waste, with uncertainty about
the actual consumption by costumers. This problem is of a high relevance today,
given the growing demand for food and the impact of food waste on the
environment, the economy, and purchasing power. We frame inventory restocking
as a new reinforcement learning task that exhibits stochastic behavior
conditioned on the agent's actions, making the environment partially
observable. We make two main contributions. First, we introduce a new
reinforcement learning environment, RetaiL, based on real grocery store data
and expert knowledge. This environment is highly stochastic, and presents a
unique challenge for reinforcement learning practitioners. We show that
uncertainty about the future behavior of the environment is not handled well by
classical supply chain algorithms, and that distributional approaches are a
good way to account for the uncertainty. Second, we introduce GTDQN, a
distributional reinforcement learning algorithm that learns a generalized Tukey
Lambda distribution over the reward space. GTDQN provides a strong baseline for
our environment. It outperforms other distributional reinforcement learning
approaches in this partially observable setting, in both overall reward and
reduction of generated waste.
Related papers
- An Optimistic-Robust Approach for Dynamic Positioning of Omnichannel
Inventories [10.353243563465124]
We introduce a new class of data-driven optimistic-robust bimodal inventory optimization (BIO) strategy.
Our experiments show that significant benefits can be achieved by rethinking traditional approaches to inventory management.
arXiv Detail & Related papers (2023-10-17T23:10:57Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - Online Learning with Costly Features in Non-stationary Environments [6.009759445555003]
In sequential decision-making problems, maximizing long-term rewards is the primary goal.
In real-world problems, collecting beneficial information is often costly.
We develop an algorithm that guarantees a sublinear regret in time.
arXiv Detail & Related papers (2023-07-18T16:13:35Z) - Enhancing Supply Chain Resilience: A Machine Learning Approach for
Predicting Product Availability Dates Under Disruption [2.294014185517203]
COVID 19 pandemic and ongoing political and regional conflicts have a highly detrimental impact on the global supply chain.
accurately predicting availability dates plays a pivotal role in executing successful logistics operations.
We evaluate several regression models, including Simple Regression, Lasso Regression, Ridge Regression, Elastic Net, Random Forest (RF), Gradient Boosting Machine (GBM) and Neural Network models.
arXiv Detail & Related papers (2023-04-28T15:22:20Z) - Improving Self-supervised Learning with Automated Unsupervised Outlier
Arbitration [83.29856873525674]
We introduce a lightweight latent variable model UOTA, targeting the view sampling issue for self-supervised learning.
Our method directly generalizes to many mainstream self-supervised learning approaches.
arXiv Detail & Related papers (2021-12-15T14:05:23Z) - Bayesian Distributional Policy Gradients [2.28438857884398]
Distributional Reinforcement Learning maintains the entire probability distribution of the reward-to-go, i.e. the return.
Bayesian Distributional Policy Gradients (BDPG) uses adversarial training in joint-contrastive learning to estimate a variational posterior from the returns.
arXiv Detail & Related papers (2021-03-20T23:42:50Z) - Coordinated Online Learning for Multi-Agent Systems with Coupled
Constraints and Perturbed Utility Observations [91.02019381927236]
We introduce a novel method to steer the agents toward a stable population state, fulfilling the given resource constraints.
The proposed method is a decentralized resource pricing method based on the resource loads resulting from the augmentation of the game's Lagrangian.
arXiv Detail & Related papers (2020-10-21T10:11:17Z) - Ecological Reinforcement Learning [76.9893572776141]
We study the kinds of environment properties that can make learning under such conditions easier.
understanding how properties of the environment impact the performance of reinforcement learning agents can help us to structure our tasks in ways that make learning tractable.
arXiv Detail & Related papers (2020-06-22T17:55:03Z) - Reinforcement Learning for Multi-Product Multi-Node Inventory Management
in Supply Chains [17.260459603456745]
This paper describes the application of reinforcement learning (RL) to multi-product inventory management in supply chains.
Experiments show that the proposed approach is able to handle a multi-objective reward comprised of maximising product sales and minimising wastage of perishable products.
arXiv Detail & Related papers (2020-06-07T04:02:59Z) - Maximizing Information Gain in Partially Observable Environments via
Prediction Reward [64.24528565312463]
This paper tackles the challenge of using belief-based rewards for a deep RL agent.
We derive the exact error between negative entropy and the expected prediction reward.
This insight provides theoretical motivation for several fields using prediction rewards.
arXiv Detail & Related papers (2020-05-11T08:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.