Related papers: Cooperative Multi-Agent Reinforcement Learning for Inventory Management

Cooperative Multi-Agent Reinforcement Learning for Inventory Management

URL: http://arxiv.org/abs/2304.08769v1
Date: Tue, 18 Apr 2023 06:55:59 GMT
Title: Cooperative Multi-Agent Reinforcement Learning for Inventory Management
Authors: Madhav Khirwar, Karthik S. Gurumoorthy, Ankit Ajit Jain, Shantala Manchenahally
Abstract summary: Reinforcement Learning (RL) for inventory management is a nascent field of research. We present a system with a custom GPU-parallelized environment that consists of one warehouse and multiple stores. We achieve a system that outperforms standard inventory control policies.
Score: 0.5276232626689566
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With Reinforcement Learning (RL) for inventory management (IM) being a nascent field of research, approaches tend to be limited to simple, linear environments with implementations that are minor modifications of off-the-shelf RL algorithms. Scaling these simplistic environments to a real-world supply chain comes with a few challenges such as: minimizing the computational requirements of the environment, specifying agent configurations that are representative of dynamics at real world stores and warehouses, and specifying a reward framework that encourages desirable behavior across the whole supply chain. In this work, we present a system with a custom GPU-parallelized environment that consists of one warehouse and multiple stores, a novel architecture for agent-environment dynamics incorporating enhanced state and action spaces, and a shared reward specification that seeks to optimize for a large retailer's supply chain needs. Each vertex in the supply chain graph is an independent agent that, based on its own inventory, able to place replenishment orders to the vertex upstream. The warehouse agent, aside from placing orders from the supplier, has the special property of also being able to constrain replenishment to stores downstream, which results in it learning an additional allocation sub-policy. We achieve a system that outperforms standard inventory control policies such as a base-stock policy and other RL-based specifications for 1 product, and lay out a future direction of work for multiple products.

Related papers

Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models. Controlled Decoding provides a mechanism for aligning a model at inference time without retraining. We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization [11.620274237352026]
offline reinforcement learning (RL) has garnered significant attention for its ability to learn effective policies from pre-collected datasets. MARL presents additional challenges due to the large joint state-action space and the complexity of multi-agent behaviors. We introduce a regularizer in the space of stationary distributions to better handle distributional shift.
arXiv Detail & Related papers (2024-10-02T18:56:10Z)
Enhancing Supply Chain Visibility with Knowledge Graphs and Large Language Models [49.898152180805454]
This paper presents a novel framework leveraging Knowledge Graphs (KGs) and Large Language Models (LLMs) to enhance supply chain visibility. Our zero-shot, LLM-driven approach automates the extraction of supply chain information from diverse public sources. With high accuracy in NER and RE tasks, it provides an effective tool for understanding complex, multi-tiered supply networks.
arXiv Detail & Related papers (2024-08-05T17:11:29Z)
REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models. In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL. We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z)
MARLIM: Multi-Agent Reinforcement Learning for Inventory Management [1.1470070927586016]
This paper presents a novel reinforcement learning framework called MARLIM to address the inventory management problem. Within this context, controllers are developed through single or multiple agents in a cooperative setting. Numerical experiments on real data demonstrate the benefits of reinforcement learning methods over traditional baselines.
arXiv Detail & Related papers (2023-08-03T09:31:45Z)
Neural Inventory Control in Networks via Hindsight Differentiable Policy Optimization [5.590976834881065]
We argue that inventory management presents unique opportunities for reliably applying and evaluating deep reinforcement learning (DRL) algorithms. The first is Hindsight Differentiable Policy Optimization (HDPO), which performs gradient descent to optimize policy performance. The second technique involves aligning policy (neural) network structures with the structure of the inventory network.
arXiv Detail & Related papers (2023-06-20T02:58:25Z)
No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution [48.27759561064771]
We consider the two-echelon supply chain model introduced in [Cachon and Zipkin, 1999] under two different settings. We design algorithms that achieve favorable guarantees for both regret and convergence to the optimal inventory decision in both settings. Our algorithms are based on Online Gradient Descent and Online Newton Step, together with several new ingredients specifically designed for our problem.
arXiv Detail & Related papers (2022-10-23T08:45:39Z)
Concepts and Algorithms for Agent-based Decentralized and Integrated Scheduling of Production and Auxiliary Processes [78.120734120667]
This paper describes an agent-based decentralized and integrated scheduling approach. Part of the requirements is to develop a linearly scaling communication architecture. The approach is explained using an example based on industrial requirements.
arXiv Detail & Related papers (2022-05-06T18:44:29Z)
Control of Dual-Sourcing Inventory Systems using Recurrent Neural Networks [0.0]
We show that proposed neural network controllers (NNCs) are able to learn near-optimal policies of commonly used instances within a few minutes of CPU time. Our research opens up new ways of efficiently managing complex, high-dimensional inventory dynamics.
arXiv Detail & Related papers (2022-01-16T19:44:06Z)
Deep Policy Iteration with Integer Programming for Inventory Management [8.27175065641495]
We present a framework for optimizing long-term discounted reward problems with large accessible action space and state dependent constraints. Our proposed Programmable Actor Reinforcement Learning (PARL) uses a deep-policy method that leverages neural networks (NNs) to approximate the value function. We benchmark the proposed algorithm against state-of-the-art RL algorithms and commonly used replenishments and find it considerably outperforms existing methods by as much as 14.7% on average.
arXiv Detail & Related papers (2021-12-04T01:40:34Z)
Creating Training Sets via Weak Indirect Supervision [66.77795318313372]
Weak Supervision (WS) frameworks synthesize training labels from multiple potentially noisy supervision sources. We formulate Weak Indirect Supervision (WIS), a new research problem for automatically synthesizing training labels. We develop a probabilistic modeling approach, PLRM, which uses user-provided label relations to model and leverage indirect supervision sources.
arXiv Detail & Related papers (2021-10-07T14:09:35Z)
Will bots take over the supply chain? Revisiting Agent-based supply chain automation [71.77396882936951]
Agent-based supply chains have been proposed since early 2000; industrial uptake has been lagging. We find that agent-based technology has matured, and other supporting technologies that are penetrating supply chains are filling in gaps. For example, the ubiquity of IoT technology helps agents "sense" the state of affairs in a supply chain and opens up new possibilities for automation.
arXiv Detail & Related papers (2021-09-03T18:44:26Z)
Reinforcement Learning for Multi-Product Multi-Node Inventory Management in Supply Chains [17.260459603456745]
This paper describes the application of reinforcement learning (RL) to multi-product inventory management in supply chains. Experiments show that the proposed approach is able to handle a multi-objective reward comprised of maximising product sales and minimising wastage of perishable products.
arXiv Detail & Related papers (2020-06-07T04:02:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.