MORSE: Multi-Objective Reinforcement Learning via Strategy Evolution for Supply Chain Optimization
- URL: http://arxiv.org/abs/2509.06490v1
- Date: Mon, 08 Sep 2025 09:51:24 GMT
- Title: MORSE: Multi-Objective Reinforcement Learning via Strategy Evolution for Supply Chain Optimization
- Authors: Niki Kotecha, Ehecatl Antonio del Rio Chanona,
- Abstract summary: In supply chain management, decision-making often involves balancing multiple objectives, such as cost reduction, service level improvement, and environmental sustainability.<n>Traditional multi-objective optimization methods, such as linear programming and evolutionary algorithms, struggle to adapt in real-time to the dynamic nature of supply chains.<n>We propose an approach that combines Reinforcement Learning (RL) and Multi-Objective Evolutionary Algorithms (MOEAs) to address these challenges for dynamic multi-objective optimization under uncertainty.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In supply chain management, decision-making often involves balancing multiple conflicting objectives, such as cost reduction, service level improvement, and environmental sustainability. Traditional multi-objective optimization methods, such as linear programming and evolutionary algorithms, struggle to adapt in real-time to the dynamic nature of supply chains. In this paper, we propose an approach that combines Reinforcement Learning (RL) and Multi-Objective Evolutionary Algorithms (MOEAs) to address these challenges for dynamic multi-objective optimization under uncertainty. Our method leverages MOEAs to search the parameter space of policy neural networks, generating a Pareto front of policies. This provides decision-makers with a diverse population of policies that can be dynamically switched based on the current system objectives, ensuring flexibility and adaptability in real-time decision-making. We also introduce Conditional Value-at-Risk (CVaR) to incorporate risk-sensitive decision-making, enhancing resilience in uncertain environments. We demonstrate the effectiveness of our approach through case studies, showcasing its ability to respond to supply chain dynamics and outperforming state-of-the-art methods in an inventory management case study. The proposed strategy not only improves decision-making efficiency but also offers a more robust framework for managing uncertainty and optimizing performance in supply chains.
Related papers
- AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization [61.535567824938205]
We introduce AdaEvolve, a framework that reformulates LLM-driven evolution as a hierarchical adaptive optimization problem.<n>AdaEvolve consistently outperforms the open-ended baselines across 185 different open-ended optimization problems.
arXiv Detail & Related papers (2026-02-23T18:45:31Z) - Joint Optimization of Cooperation Efficiency and Communication Covertness for Target Detection with AUVs [105.81167650318054]
This paper investigates underwater cooperative target detection using autonomous underwater vehicles (AUVs)<n>We first formulate a joint trajectory and power control optimization problem, and then present an innovative hierarchical action management framework to solve it.<n>Under the centralized training and decentralized execution paradigm, our target detection framework enables adaptive covert cooperation while satisfying both energy and mobility constraints.
arXiv Detail & Related papers (2025-10-21T02:14:11Z) - Optimizing Multi-Tier Supply Chain Ordering with LNN+XGBoost: Mitigating the Bullwhip Effect [0.0]
This study introduces a hybrid LNN and XGBoost model to optimize ordering strategies in multi-tier supply chains.<n>By leveraging LNN's dynamic feature extraction and XGBoost's global optimization capabilities, the model aims to mitigate the bullwhip effect and enhance cumulative profitability.
arXiv Detail & Related papers (2025-07-28T23:24:54Z) - Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z) - EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning [69.55982246413046]
We propose explicit policy optimization (EPO) for strategic reasoning.<n>We train the strategic reasoning model via multi-turn reinforcement learning (RL),utilizing process rewards and iterative self-play.<n>Our findings reveal various collaborative reasoning mechanisms emergent in EPO and its effectiveness in generating novel strategies.
arXiv Detail & Related papers (2025-02-18T03:15:55Z) - A Re-solving Heuristic for Dynamic Assortment Optimization with Knapsack Constraints [14.990988698038686]
We consider a multi-stage dynamic assortment optimization problem with multi-nomial choice modeling (MNL) under resource knapsack constraints.
With the exact optimal dynamic assortment solution being computationally intractable, a practical strategy is to adopt the re-solving technique that periodically re-optimizes deterministic linear programs.
We propose a new epoch-based re-solving algorithm that effectively transforms the denominator of the objective into the constraint.
arXiv Detail & Related papers (2024-07-08T02:40:20Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - A Robust Policy Bootstrapping Algorithm for Multi-objective
Reinforcement Learning in Non-stationary Environments [15.794728813746397]
Multi-objective reinforcement learning methods fuse the reinforcement learning paradigm with multi-objective optimization techniques.
One major drawback of these methods is the lack of adaptability to non-stationary dynamics in the environment.
We propose a novel multi-objective reinforcement learning algorithm that can robustly evolve a convex coverage set of policies in an online manner in non-stationary environments.
arXiv Detail & Related papers (2023-08-18T02:15:12Z) - Online Nonstochastic Model-Free Reinforcement Learning [35.377261344335736]
We investigate robust model robustness guarantees for environments that may be dynamic or adversarial.
We provide efficient and efficient algorithms for optimizing these policies.
These are the best-known developments in having no dependence on the state-space dimension in having no dependence on the state-space.
arXiv Detail & Related papers (2023-05-27T19:02:55Z) - Interpretable Reinforcement Learning via Neural Additive Models for
Inventory Management [3.714118205123092]
We focus on developing dynamic inventory ordering policies for a multi-echelon, i.e. multi-stage, supply chain.
Traditional inventory optimization methods aim to determine a static reordering policy.
We propose an interpretable reinforcement learning approach that aims to be as interpretable as the traditional static policies.
arXiv Detail & Related papers (2023-03-18T10:13:32Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.