Adaptive Inventory Strategies using Deep Reinforcement Learning for Dynamic Agri-Food Supply Chains
- URL: http://arxiv.org/abs/2507.16670v1
- Date: Tue, 22 Jul 2025 15:02:54 GMT
- Title: Adaptive Inventory Strategies using Deep Reinforcement Learning for Dynamic Agri-Food Supply Chains
- Authors: Amandeep Kaur, Gyan Prakash,
- Abstract summary: This study focuses on inventory management of agri-food products under demand and lead time uncertainties.<n>It proposes a novel Deep Reinforcement Learning (DRL) algorithm that combines the benefits of both value- and policy-based DRL approaches for inventory optimization under uncertainties.
- Score: 1.7930468380414317
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Agricultural products are often subject to seasonal fluctuations in production and demand. Predicting and managing inventory levels in response to these variations can be challenging, leading to either excess inventory or stockouts. Additionally, the coordination among stakeholders at various level of food supply chain is not considered in the existing body of literature. To bridge these research gaps, this study focuses on inventory management of agri-food products under demand and lead time uncertainties. By implementing effective inventory replenishment policy results in maximize the overall profit throughout the supply chain. However, the complexity of the problem increases due to these uncertainties and shelf-life of the product, that makes challenging to implement traditional approaches to generate optimal set of solutions. Thus, the current study propose a novel Deep Reinforcement Learning (DRL) algorithm that combines the benefits of both value- and policy-based DRL approaches for inventory optimization under uncertainties. The proposed algorithm can incentivize collaboration among stakeholders by aligning their interests and objectives through shared optimization goal of maximizing profitability along the agri-food supply chain while considering perishability, and uncertainty simultaneously. By selecting optimal order quantities with continuous action space, the proposed algorithm effectively addresses the inventory optimization challenges. To rigorously evaluate this algorithm, the empirical data from fresh agricultural products supply chain inventory is considered. Experimental results corroborate the improved performance of the proposed inventory replenishment policy under stochastic demand patterns and lead time scenarios. The research findings hold managerial implications for policymakers to manage the inventory of agricultural products more effectively under uncertainty.
Related papers
- Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z) - Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward Modeling [84.00480999255628]
Reinforcement Learning algorithms for safety alignment of Large Language Models (LLMs) encounter the challenge of distribution shift.<n>Current approaches typically address this issue through online sampling from the target policy.<n>We propose a new framework that leverages the model's intrinsic safety judgment capability to extract reward signals.
arXiv Detail & Related papers (2025-03-13T06:40:34Z) - Classical and Deep Reinforcement Learning Inventory Control Policies for Pharmaceutical Supply Chains with Perishability and Non-Stationarity [1.0124625066746595]
We study inventory control policies for pharmaceutical supply chains, addressing challenges such as perishability, yield uncertainty, and non-stationary demand.<n>We benchmark three policies--order-up-to (OUT), projected inventory level (PIL), and deep reinforcement learning (DRL)
arXiv Detail & Related papers (2025-01-18T22:40:33Z) - Enhancing Supply Chain Visibility with Knowledge Graphs and Large Language Models [49.898152180805454]
This paper presents a novel framework leveraging Knowledge Graphs (KGs) and Large Language Models (LLMs) to enhance supply chain visibility.
Our zero-shot, LLM-driven approach automates the extraction of supply chain information from diverse public sources.
With high accuracy in NER and RE tasks, it provides an effective tool for understanding complex, multi-tiered supply networks.
arXiv Detail & Related papers (2024-08-05T17:11:29Z) - Multiple Independent DE Optimizations to Tackle Uncertainty and
Variability in Demand in Inventory Management [0.0]
This study aims to discern the most effective strategy for minimizing inventory costs within the context of uncertain demand patterns.
To find the optimal solution, the study focuses on meta-heuristic approaches and compares multiple algorithms.
arXiv Detail & Related papers (2023-09-22T13:15:02Z) - Distributional constrained reinforcement learning for supply chain
optimization [0.0]
We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in reinforcement learning.
We show that DCPO improves the rate at which the RL policy converges and ensures reliable constraint satisfaction by the end of training.
arXiv Detail & Related papers (2023-02-03T13:43:02Z) - Multi-Agent Reinforcement Learning with Shared Resources for Inventory
Management [62.23979094308932]
In our setting, the constraint on the shared resources (such as the inventory capacity) couples the otherwise independent control for each SKU.
We formulate the problem with this structure as Shared-Resource Game (SRSG)and propose an efficient algorithm called Context-aware Decentralized PPO (CD-PPO)
Through extensive experiments, we demonstrate that CD-PPO can accelerate the learning procedure compared with standard MARL algorithms.
arXiv Detail & Related papers (2022-12-15T09:35:54Z) - A Simulation Environment and Reinforcement Learning Method for Waste
Reduction [50.545552995521774]
We study the problem of restocking a grocery store's inventory with perishable items over time, from a distributional point of view.
The objective is to maximize sales while minimizing waste, with uncertainty about the actual consumption by costumers.
We frame inventory restocking as a new reinforcement learning task that exhibits behavior conditioned on the agent's actions.
arXiv Detail & Related papers (2022-05-30T22:48:57Z) - Comparing Deep Reinforcement Learning Algorithms in Two-Echelon Supply
Chains [1.4685355149711299]
We analyze and compare the performance of state-of-the-art deep reinforcement learning algorithms for solving the supply chain inventory management problem.
This study provides detailed insight into the design and development of an open-source software library that provides a customizable environment for solving the supply chain inventory management problem.
arXiv Detail & Related papers (2022-04-20T16:33:01Z) - Deep Policy Iteration with Integer Programming for Inventory Management [8.27175065641495]
We present a framework for optimizing long-term discounted reward problems with large accessible action space and state dependent constraints.<n>Our proposed Programmable Actor Reinforcement Learning (PARL) uses a deep-policy method that leverages neural networks (NNs) to approximate the value function.<n>We benchmark the proposed algorithm against state-of-the-art RL algorithms and commonly used replenishments and find it considerably outperforms existing methods by as much as 14.7% on average.
arXiv Detail & Related papers (2021-12-04T01:40:34Z) - False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z) - Hierarchical Adaptive Contextual Bandits for Resource Constraint based
Recommendation [49.69139684065241]
Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems.
In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.
arXiv Detail & Related papers (2020-04-02T17:04:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.