Related papers: Iterative Multi-Agent Reinforcement Learning: A Novel Approach Toward Real-World Multi-Echelon Inventory Optimization

Iterative Multi-Agent Reinforcement Learning: A Novel Approach Toward Real-World Multi-Echelon Inventory Optimization

URL: http://arxiv.org/abs/2503.18201v1
Date: Sun, 23 Mar 2025 20:52:21 GMT
Title: Iterative Multi-Agent Reinforcement Learning: A Novel Approach Toward Real-World Multi-Echelon Inventory Optimization
Authors: Georg Ziegner, Michael Choi, Hung Mac Chan Le, Sahil Sakhuja, Arash Sarmadi,
Abstract summary: Multi-echelon inventory optimization (MEIO) is critical for effective supply chain management, but its inherent complexity can pose significant challenges.<n>Recent research has found deep reinforcement learning (DRL) to be a promising alternative to traditional reinforcement learning.<n>This thesis investigates DRL's applicability to MEIO problems of increasing complexity.
Score: 0.6990493129893112
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-echelon inventory optimization (MEIO) is critical for effective supply chain management, but its inherent complexity can pose significant challenges. Heuristics are commonly used to address this complexity, yet they often face limitations in scope and scalability. Recent research has found deep reinforcement learning (DRL) to be a promising alternative to traditional heuristics, offering greater versatility by utilizing dynamic decision-making capabilities. However, since DRL is known to struggle with the curse of dimensionality, its relevance to complex real-life supply chain scenarios is still to be determined. This thesis investigates DRL's applicability to MEIO problems of increasing complexity. A state-of-the-art DRL model was replicated, enhanced, and tested across 13 supply chain scenarios, combining diverse network structures and parameters. To address DRL's challenges with dimensionality, additional models leveraging graph neural networks (GNNs) and multi-agent reinforcement learning (MARL) were developed, culminating in the novel iterative multi-agent reinforcement learning (IMARL) approach. IMARL demonstrated superior scalability, effectiveness, and reliability in optimizing inventory policies, consistently outperforming benchmarks. These findings confirm the potential of DRL, particularly IMARL, to address real-world supply chain challenges and call for additional research to further expand its applicability.

Related papers

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning [69.44871115752055]
We propose an advanced multimodal reasoning model trained via a novel Progressive Curriculum Reinforcement Learning (PCuRL) framework.<n>PCuRL systematically guides the model through tasks of gradually increasing difficulty, substantially improving its reasoning abilities across diverse multimodal contexts.<n>The framework introduces two key innovations: (1) an online difficulty soft weighting mechanism, dynamically adjusting training difficulty across successive RL training stages; and (2) a dynamic length reward mechanism, which encourages the model to adaptively regulate its reasoning path length according to task complexity.
arXiv Detail & Related papers (2025-07-30T12:23:21Z)
DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning [4.817888539036794]
DynaSearcher is an innovative search agent enhanced by dynamic knowledge graphs and multi-reward reinforcement learning (RL)<n>We employ a multi-reward RL framework for fine-grained control over training objectives such as retrieval accuracy, efficiency, and response quality.<n> Experimental results demonstrate that our approach achieves state-of-the-art answer accuracy on six multi-hop question answering datasets.
arXiv Detail & Related papers (2025-07-23T09:58:31Z)
Deep RL Dual Sourcing Inventory Management with Supply and Capacity Risk Awareness [4.583289433858458]
We study how to efficiently apply reinforcement learning (RL) for solving large-scale optimization problems by leveraging intervention models.<n>We demonstrate our approach on a challenging real-world application, the multi-sourcing multi-period inventory management problem in supply chain optimization.
arXiv Detail & Related papers (2025-07-19T02:44:45Z)
The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs [66.17068546293487]
Large vision-language models (VLMs) increasingly adopt post-training techniques such as long chain-of-thought (CoT) supervised fine-tuning (SFT) and reinforcement learning (RL) to elicit sophisticated reasoning.<n>We present a systematic investigation into the distinct roles and interplay of long-CoT SFT and RL across multiple multimodal reasoning benchmarks.<n>We find that SFT improves performance on difficult questions by in-depth, structured reasoning, but introduces verbosity and degrades performance on simpler ones.
arXiv Detail & Related papers (2025-07-10T09:05:49Z)
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models [22.796496516709514]
This survey systematically reviews recent advances in RL-based reasoning for Multimodal Large Language Models. We highlight two main RL paradigms--value-free and value-based methods--and analyze how RL enhances reasoning abilities. We provide an extensive overview of benchmark datasets, evaluation protocols, and existing limitations.
arXiv Detail & Related papers (2025-04-30T03:14:28Z)
Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains [92.36624674516553]
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs) We investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education. We utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications.
arXiv Detail & Related papers (2025-03-31T08:22:49Z)
Residual Learning Inspired Crossover Operator and Strategy Enhancements for Evolutionary Multitasking [0.3749861135832073]
In evolutionary multitasking, strategies such as crossover operators and skill factor assignment are critical for effective knowledge transfer. This paper proposes the Multifactorial Evolutionary Algorithm-Residual Learning (MFEA-RL) method based on residual learning. A ResNet-based mechanism dynamically assigns skill factors to improve task adaptability, while a random mapping mechanism efficiently performs crossover operations.
arXiv Detail & Related papers (2025-03-27T10:27:17Z)
OThink-MR1: Stimulating multimodal generalized reasoning capabilities through dynamic reinforcement learning [29.053899071144976]
We introduce OThink-MR1, a framework that extends reinforcement learning to Multimodal Language Models.<n>We design a dynamic Kullback-Leibler strategy that significantly enhances RL performance, surpassing SFT in same-task evaluations.<n>Also, we are the first to reveal that RL exhibits remarkable cross-task generalization capabilities, which shows that models post-trained with RL on one multimodal task can be effectively transfered to another tasks.
arXiv Detail & Related papers (2025-03-20T12:22:18Z)
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z)
Progressive Multimodal Reasoning via Active Retrieval [64.74746997923967]
Multi-step multimodal reasoning tasks pose significant challenges for large language models (MLLMs)<n>We propose AR-MCTS, a universal framework designed to progressively improve the reasoning capabilities of MLLMs.<n>We show that AR-MCTS can optimize sampling diversity and accuracy, yielding reliable multimodal reasoning.
arXiv Detail & Related papers (2024-12-19T13:25:39Z)
Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review [50.67937325077047]
This paper is devoted to a comprehensive review of realizing the sample efficiency and generalization of RL algorithms through transfer and inverse reinforcement learning (T-IRL) Our findings denote that a majority of recent research works have dealt with the aforementioned challenges by utilizing human-in-the-loop and sim-to-real strategies. Under the IRL structure, training schemes that require a low number of experience transitions and extension of such frameworks to multi-agent and multi-intention problems have been the priority of researchers in recent years.
arXiv Detail & Related papers (2024-11-15T15:18:57Z)
Towards Practical Operation of Deep Reinforcement Learning Agents in Real-World Network Management at Open RAN Edges [5.345501810244355]
Deep Reinforcement Learning (DRL) has emerged as a powerful solution for meeting the growing demands for connectivity, reliability, low latency and operational efficiency in advanced networks.<n>We first present an orchestration framework that integrates ETSI Multi-access Edge Computing (MEC) with Open RAN, enabling seamless adoption of DRL-based strategies across different time scales.<n>We then identify three critical challenges hindering DRL's real-world deployment, including (1) asynchronous requests from unpredictable or bursty traffic, (2) adaptability and generalization across heterogeneous topologies and evolving service demands, and (3) prolonged convergence and service interruptions due to exploration in
arXiv Detail & Related papers (2024-10-30T15:02:54Z)
MAIDCRL: Semi-centralized Multi-Agent Influence Dense-CNN Reinforcement Learning [0.7366405857677227]
We present a semi-centralized Dense Reinforcement Learning algorithm enhanced by agent influence maps (AIMs) for learning effective multi-agent control on StarCraft Multi-Agent Challenge (SMAC) scenarios. The results show that the CNN-enabled MAIDCRL significantly improved the learning performance and achieved a faster learning rate compared to the existing MAIDRL.
arXiv Detail & Related papers (2024-02-12T18:53:20Z)
M2CURL: Sample-Efficient Multimodal Reinforcement Learning via Self-Supervised Representation Learning for Robotic Manipulation [0.7564784873669823]
We propose Multimodal Contrastive Unsupervised Reinforcement Learning (M2CURL) Our approach employs a novel multimodal self-supervised learning technique that learns efficient representations and contributes to faster convergence of RL algorithms. We evaluate M2CURL on the Tactile Gym 2 simulator and we show that it significantly enhances the learning efficiency in different manipulation tasks.
arXiv Detail & Related papers (2024-01-30T14:09:35Z)
MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms. Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return. We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z)
Reinforcement Learning as One Big Sequence Modeling Problem [84.84564880157149]
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models. We view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards.
arXiv Detail & Related papers (2021-06-03T17:58:51Z)
Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time. To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations. We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z)
Detecting and adapting to crisis pattern with context based Deep Reinforcement Learning [6.224519494738852]
We present an innovative DRL framework consisting in two sub-networks fed respectively with portfolio strategies past performances and standard deviations as well as additional contextual features. Results on test set show this approach substantially over-performs traditional portfolio optimization methods like Markowitz and is able to detect and anticipate crisis like the current Covid one.
arXiv Detail & Related papers (2020-09-07T12:11:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.