Related papers: Using General Value Functions to Learn Domain-Backed Inventory Management Policies

Using General Value Functions to Learn Domain-Backed Inventory Management Policies

URL: http://arxiv.org/abs/2311.02125v1
Date: Fri, 3 Nov 2023 08:35:54 GMT
Title: Using General Value Functions to Learn Domain-Backed Inventory Management Policies
Authors: Durgesh Kalwar, Omkar Shelke, Harshad Khadilkar
Abstract summary: In existing literature, General Value Functions (GVFs) have primarily been used for auxiliary task learning. We use this capability to train GVFs on domain-critical characteristics such as prediction of stock-out probability and wastage quantity. We show that the GVF predictions can be used to provide additional domain-backed insights into the decisions proposed by the RL agent.
Score: 2.0257616108612373
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We consider the inventory management problem, where the goal is to balance conflicting objectives such as availability and wastage of a large range of products in a store. We propose a reinforcement learning (RL) approach that utilises General Value Functions (GVFs) to derive domain-backed inventory replenishment policies. The inventory replenishment decisions are modelled as a sequential decision making problem, which is challenging due to uncertain demand and the existence of aggregate (cross-product) constraints. In existing literature, GVFs have primarily been used for auxiliary task learning. We use this capability to train GVFs on domain-critical characteristics such as prediction of stock-out probability and wastage quantity. Using this domain expertise for more effective exploration, we train an RL agent to compute the inventory replenishment quantities for a large range of products (up to 6000 in the reported experiments), which share aggregate constraints such as the total weight/volume per delivery. Additionally, we show that the GVF predictions can be used to provide additional domain-backed insights into the decisions proposed by the RL agent. Finally, since the environment dynamics are fully transferred, the trained GVFs can be used for faster adaptation to vastly different business objectives (for example, due to the start of a promotional period or due to deployment in a new customer environment).

Related papers

Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review [50.67937325077047]
This paper is devoted to a comprehensive review of realizing the sample efficiency and generalization of RL algorithms through transfer and inverse reinforcement learning (T-IRL) Our findings denote that a majority of recent research works have dealt with the aforementioned challenges by utilizing human-in-the-loop and sim-to-real strategies. Under the IRL structure, training schemes that require a low number of experience transitions and extension of such frameworks to multi-agent and multi-intention problems have been the priority of researchers in recent years.
arXiv Detail & Related papers (2024-11-15T15:18:57Z)
Zero-shot Generalization in Inventory Management: Train, then Estimate and Decide [0.0]
Deploying deep reinforcement learning (DRL) in real-world inventory management presents challenges. These challenges highlight a research gap, suggesting a need for a unifying framework to model and solve sequential decision-making under parameter uncertainty. We address this by exploring an underexplored area of DRL for inventory management: training generally capable agents (GCAs) under zero-shot generalization (ZSG)
arXiv Detail & Related papers (2024-11-01T11:20:05Z)
Towards Cost Sensitive Decision Making [14.279123976398926]
In this work, we consider RL models that may actively acquire features from the environment to improve the decision quality and certainty. We propose the Active-Acquisition POMDP and identify two types of the acquisition process for different application domains. In order to assist the agent in the actively-acquired partially-observed environment and alleviate the exploration-exploitation dilemma, we develop a model-based approach.
arXiv Detail & Related papers (2024-10-04T19:48:23Z)
Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift. We devise a series of experiments to explain the performance gap empirically.
arXiv Detail & Related papers (2024-09-27T05:06:43Z)
Beyond Expected Return: Accounting for Policy Reproducibility when Evaluating Reinforcement Learning Algorithms [9.649114720478872]
Many applications in Reinforcement Learning (RL) have noise ority present in the environment. These uncertainties lead the exact same policy to perform differently, from one roll-out to another. Common evaluation procedures in RL summarise the consequent return distributions using solely the expected return, which does not account for the spread of the distribution. Our work defines this spread as the policy: the ability of a policy to obtain similar performance when rolled out many times, a crucial property in some real-world applications.
arXiv Detail & Related papers (2023-12-12T11:22:31Z)
Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies. Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z)
Cooperative Multi-Agent Reinforcement Learning for Inventory Management [0.5276232626689566]
Reinforcement Learning (RL) for inventory management is a nascent field of research. We present a system with a custom GPU-parallelized environment that consists of one warehouse and multiple stores. We achieve a system that outperforms standard inventory control policies.
arXiv Detail & Related papers (2023-04-18T06:55:59Z)
Multi-Agent Reinforcement Learning with Shared Resources for Inventory Management [62.23979094308932]
In our setting, the constraint on the shared resources (such as the inventory capacity) couples the otherwise independent control for each SKU. We formulate the problem with this structure as Shared-Resource Game (SRSG)and propose an efficient algorithm called Context-aware Decentralized PPO (CD-PPO) Through extensive experiments, we demonstrate that CD-PPO can accelerate the learning procedure compared with standard MARL algorithms.
arXiv Detail & Related papers (2022-12-15T09:35:54Z)
Deep Policy Iteration with Integer Programming for Inventory Management [8.27175065641495]
We present a framework for optimizing long-term discounted reward problems with large accessible action space and state dependent constraints. Our proposed Programmable Actor Reinforcement Learning (PARL) uses a deep-policy method that leverages neural networks (NNs) to approximate the value function. We benchmark the proposed algorithm against state-of-the-art RL algorithms and commonly used replenishments and find it considerably outperforms existing methods by as much as 14.7% on average.
arXiv Detail & Related papers (2021-12-04T01:40:34Z)
Intelligent Warehouse Allocator for Optimal Regional Utilization [0.0]
We use machine learning and optimization methods to build an efficient solution to this warehouse allocation problem. We conduct a back-testing by using this solution and validate the efficiency of this model by demonstrating a significant uptick in two key metrics Regional Utilization (RU) and Percentage Two-day-delivery (2DD)
arXiv Detail & Related papers (2020-07-09T21:46:15Z)
Feature Alignment and Restoration for Domain Generalization and Adaptation [93.39253443415392]
Cross domain feature alignment has been widely explored to pull the feature distributions of different domains in order to learn domain-invariant representations. We propose a unified framework termed Feature Alignment and Restoration (FAR) to simultaneously ensure high generalization and discrimination power of the networks. Experiments on multiple classification benchmarks demonstrate the high performance and strong generalization of our FAR framework for both domain generalization and unsupervised domain adaptation.
arXiv Detail & Related papers (2020-06-22T05:08:13Z)
Universal Source-Free Domain Adaptation [57.37520645827318]
We propose a novel two-stage learning process for domain adaptation. In the Procurement stage, we aim to equip the model for future source-free deployment, assuming no prior knowledge of the upcoming category-gap and domain-shift. In the Deployment stage, the goal is to design a unified adaptation algorithm capable of operating across a wide range of category-gaps.
arXiv Detail & Related papers (2020-04-09T07:26:20Z)
Towards Inheritable Models for Open-Set Domain Adaptation [56.930641754944915]
We introduce a practical Domain Adaptation paradigm where a source-trained model is used to facilitate adaptation in the absence of the source dataset in future. We present an objective way to quantify inheritability to enable the selection of the most suitable source model for a given target domain, even in the absence of the source data.
arXiv Detail & Related papers (2020-04-09T07:16:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.