Deep Inventory Management
- URL: http://arxiv.org/abs/2210.03137v1
- Date: Thu, 6 Oct 2022 18:00:25 GMT
- Title: Deep Inventory Management
- Authors: Dhruv Madeka, Kari Torkkola, Carson Eisenach, Dean Foster, Anna Luo
- Abstract summary: We present a Deep Reinforcement Learning approach to solving a periodic review inventory control system.
We show that several policy learning approaches are competitive with or outperform classical baseline approaches.
- Score: 3.578617477295742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a Deep Reinforcement Learning approach to solving a periodic
review inventory control system with stochastic vendor lead times, lost sales,
correlated demand, and price matching. While this dynamic program has
historically been considered intractable, we show that several policy learning
approaches are competitive with or outperform classical baseline approaches. In
order to train these algorithms, we develop novel techniques to convert
historical data into a simulator. We also present a model-based reinforcement
learning procedure (Direct Backprop) to solve the dynamic periodic review
inventory control problem by constructing a differentiable simulator. Under a
variety of metrics Direct Backprop outperforms model-free RL and newsvendor
baselines, in both simulations and real-world deployments.
Related papers
- Optimal Execution with Reinforcement Learning [0.4972323953932129]
This study investigates the development of an optimal execution strategy through reinforcement learning.
We present a custom MDP formulation followed by the results of our methodology and benchmark the performance against standard execution strategies.
arXiv Detail & Related papers (2024-11-10T08:21:03Z) - Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL)
QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM)
Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z) - Enhancing Polynomial Chaos Expansion Based Surrogate Modeling using a
Novel Probabilistic Transfer Learning Strategy [2.980666177064344]
In black-box simulations, non-intrusive PCE allows the construction of surrogates using a set of simulation response evaluations.
We propose to leverage transfer learning whereby knowledge gained through similar PCE surrogate construction tasks is transferred to a new surrogate-construction task.
arXiv Detail & Related papers (2023-12-07T19:16:42Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Commodities Trading through Deep Policy Gradient Methods [0.0]
It formulates the commodities trading problem as a continuous, discrete-time dynamical system.
Two policy algorithms, namely actor-based and actor-critic-based approaches, are introduced.
Backtesting on front-month natural gas futures demonstrates that DRL models increase the Sharpe ratio by $83%$ compared to the buy-and-hold baseline.
arXiv Detail & Related papers (2023-08-10T17:21:12Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Model Predictive Control via On-Policy Imitation Learning [28.96122879515294]
We develop new sample complexity results and performance guarantees for data-driven Model Predictive Control.
Our algorithm uses the structure of constrained linear MPC, and our analysis uses the properties of the explicit MPC solution to theoretically bound the number of online MPC trajectories needed to achieve optimal performance.
arXiv Detail & Related papers (2022-10-17T16:06:06Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z) - Counterfactual Learning of Stochastic Policies with Continuous Actions:
from Models to Offline Evaluation [41.21447375318793]
We introduce a modelling strategy based on a joint kernel embedding of contexts and actions.
We empirically show that the optimization aspect of counterfactual learning is important.
We propose an evaluation protocol for offline policies in real-world logged systems.
arXiv Detail & Related papers (2020-04-22T07:42:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.