Interpretable Reinforcement Learning via Neural Additive Models for
Inventory Management
- URL: http://arxiv.org/abs/2303.10382v2
- Date: Wed, 22 Mar 2023 14:19:24 GMT
- Title: Interpretable Reinforcement Learning via Neural Additive Models for
Inventory Management
- Authors: Julien Siems, Maximilian Schambach, Sebastian Schulze, Johannes S.
Otterbach
- Abstract summary: We focus on developing dynamic inventory ordering policies for a multi-echelon, i.e. multi-stage, supply chain.
Traditional inventory optimization methods aim to determine a static reordering policy.
We propose an interpretable reinforcement learning approach that aims to be as interpretable as the traditional static policies.
- Score: 3.714118205123092
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The COVID-19 pandemic has highlighted the importance of supply chains and the
role of digital management to react to dynamic changes in the environment. In
this work, we focus on developing dynamic inventory ordering policies for a
multi-echelon, i.e. multi-stage, supply chain. Traditional inventory
optimization methods aim to determine a static reordering policy. Thus, these
policies are not able to adjust to dynamic changes such as those observed
during the COVID-19 crisis. On the other hand, conventional strategies offer
the advantage of being interpretable, which is a crucial feature for supply
chain managers in order to communicate decisions to their stakeholders. To
address this limitation, we propose an interpretable reinforcement learning
approach that aims to be as interpretable as the traditional static policies
while being as flexible and environment-agnostic as other deep learning-based
reinforcement learning solutions. We propose to use Neural Additive Models as
an interpretable dynamic policy of a reinforcement learning agent, showing that
this approach is competitive with a standard full connected policy. Finally, we
use the interpretability property to gain insights into a complex ordering
strategy for a simple, linear three-echelon inventory supply chain.
Related papers
- SAMBO-RL: Shifts-aware Model-based Offline Reinforcement Learning [9.88109749688605]
Model-based Offline Reinforcement Learning trains policies based on offline datasets and model dynamics.
This paper disentangles the problem into two key components: model bias and policy shift.
We introduce Shifts-aware Model-based Offline Reinforcement Learning (SAMBO-RL)
arXiv Detail & Related papers (2024-08-23T04:25:09Z) - Agent based modelling for continuously varying supply chains [4.163948606359882]
This paper seeks to address whether agents can control varying supply chain problems.
Two state-of-the-art Reinforcement Learning (RL) algorithms are compared.
Results show that more lean strategies adopted in Batch environments are different from those adopted in environments with varying products.
arXiv Detail & Related papers (2023-12-24T15:04:46Z) - Conformal Policy Learning for Sensorimotor Control Under Distribution
Shifts [61.929388479847525]
This paper focuses on the problem of detecting and reacting to changes in the distribution of a sensorimotor controller's observables.
The key idea is the design of switching policies that can take conformal quantiles as input.
We show how to design such policies by using conformal quantiles to switch between base policies with different characteristics.
arXiv Detail & Related papers (2023-11-02T17:59:30Z) - Contextual Bandits for Evaluating and Improving Inventory Control
Policies [2.2530496464901106]
We introduce the concept of an equilibrium policy, a desirable property of a policy that intuitively means that, in hindsight, changing only a small fraction of actions does not result in materially more reward.
We provide a light-weight contextual bandit-based algorithm to evaluate and occasionally tweak policies, and show that this method achieves favorable guarantees, both theoretically and in empirical studies.
arXiv Detail & Related papers (2023-10-24T18:00:40Z) - Learning Control Policies for Variable Objectives from Offline Data [2.7174376960271154]
We introduce a conceptual extension for model-based policy search methods, called variable objective policy (VOP)
We demonstrate that by altering the objectives passed as input to the policy, users gain the freedom to adjust its behavior or re-balance optimization targets at runtime.
arXiv Detail & Related papers (2023-08-11T13:33:59Z) - Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios.
We propose to leverage latent-variable policies that can represent a broader class of policy distributions.
Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients [54.98496284653234]
We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions.
We solve this problem by introducing a regularizer based on the mutual information between the sensitive state and the actions.
We develop a model-based estimator for optimization of privacy-constrained policies.
arXiv Detail & Related papers (2020-12-30T03:22:35Z) - Efficient Empowerment Estimation for Unsupervised Stabilization [75.32013242448151]
empowerment principle enables unsupervised stabilization of dynamical systems at upright positions.
We propose an alternative solution based on a trainable representation of a dynamical system as a Gaussian channel.
We show that our method has a lower sample complexity, is more stable in training, possesses the essential properties of the empowerment function, and allows estimation of empowerment from images.
arXiv Detail & Related papers (2020-07-14T21:10:16Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z) - Contextual Policy Transfer in Reinforcement Learning Domains via Deep
Mixtures-of-Experts [24.489002406693128]
We introduce a novel mixture-of-experts formulation for learning state-dependent beliefs over source task dynamics.
We show how this model can be incorporated into standard policy reuse frameworks.
arXiv Detail & Related papers (2020-02-29T07:58:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.