Agent based modelling for continuously varying supply chains
- URL: http://arxiv.org/abs/2312.15502v1
- Date: Sun, 24 Dec 2023 15:04:46 GMT
- Title: Agent based modelling for continuously varying supply chains
- Authors: Wan Wang, Haiyan Wang, Adam J.Sobey
- Abstract summary: This paper seeks to address whether agents can control varying supply chain problems.
Two state-of-the-art Reinforcement Learning (RL) algorithms are compared.
Results show that more lean strategies adopted in Batch environments are different from those adopted in environments with varying products.
- Score: 4.163948606359882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Problem definition: Supply chains are constantly evolving networks.
Reinforcement learning is increasingly proposed as a solution to provide
optimal control of these networks. Academic/practical: However, learning in
continuously varying environments remains a challenge in the reinforcement
learning literature.Methodology: This paper therefore seeks to address whether
agents can control varying supply chain problems, transferring learning between
environments that require different strategies and avoiding catastrophic
forgetting of tasks that have not been seen in a while. To evaluate this
approach, two state-of-the-art Reinforcement Learning (RL) algorithms are
compared: an actor-critic learner, Proximal Policy Optimisation(PPO), and a
Recurrent Proximal Policy Optimisation (RPPO), PPO with a Long Short-Term
Memory(LSTM) layer, which is showing popularity in online learning
environments. Results: First these methods are compared on six sets of
environments with varying degrees of stochasticity. The results show that more
lean strategies adopted in Batch environments are different from those adopted
in Stochastic environments with varying products. The methods are also compared
on various continuous supply chain scenarios, where the PPO agents are shown to
be able to adapt through continuous learning when the tasks are similar but
show more volatile performance when changing between the extreme tasks.
However, the RPPO, with an ability to remember histories, is able to overcome
this to some extent and takes on a more realistic strategy. Managerial
implications: Our results provide a new perspective on the continuously varying
supply chain, the cooperation and coordination of agents are crucial for
improving the overall performance in uncertain and semi-continuous
non-stationary supply chain environments without the need to retrain the
environment as the demand changes.
Related papers
- Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents.
Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z) - Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline
Reinforcement Learning [57.83919813698673]
Projected Off-Policy Q-Learning (POP-QL) is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error.
In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.
arXiv Detail & Related papers (2023-11-25T00:30:58Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Selective Uncertainty Propagation in Offline RL [28.324479520451195]
We consider the finite-horizon offline reinforcement learning (RL) setting, and are motivated by the challenge of learning the policy at any step h in dynamic programming (DP) algorithms.
We develop a flexible and general method called selective uncertainty propagation for confidence interval construction that adapts to the hardness of the associated distribution shift challenges.
arXiv Detail & Related papers (2023-02-01T07:31:25Z) - Robust Policy Optimization in Deep Reinforcement Learning [16.999444076456268]
In continuous action domains, parameterized distribution of action distribution allows easy control of exploration.
In particular, we propose an algorithm called Robust Policy Optimization (RPO), which leverages a perturbed distribution.
We evaluated our methods on various continuous control tasks from DeepMind Control, OpenAI Gym, Pybullet, and IsaacGym.
arXiv Detail & Related papers (2022-12-14T22:43:56Z) - Dynamics-Adaptive Continual Reinforcement Learning via Progressive
Contextualization [29.61829620717385]
Key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the RL agent's behavior as the environment changes over its lifetime.
DaCoRL learns a context-conditioned policy using progressive contextualization.
DaCoRL features consistent superiority over existing methods in terms of the stability, overall performance and generalization ability.
arXiv Detail & Related papers (2022-09-01T10:26:58Z) - TASAC: a twin-actor reinforcement learning framework with stochastic
policy for batch process control [1.101002667958165]
Reinforcement Learning (RL) wherein an agent learns the policy by directly interacting with the environment, offers a potential alternative in this context.
RL frameworks with actor-critic architecture have recently become popular for controlling systems where state and action spaces are continuous.
It has been shown that an ensemble of actor and critic networks further helps the agent learn better policies due to the enhanced exploration due to simultaneous policy learning.
arXiv Detail & Related papers (2022-04-22T13:00:51Z) - Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios.
We propose to leverage latent-variable policies that can represent a broader class of policy distributions.
Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z) - Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints.
We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z) - Consolidation via Policy Information Regularization in Deep RL for
Multi-Agent Games [21.46148507577606]
This paper introduces an information-theoretic constraint on learned policy complexity in the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) reinforcement learning algorithm.
Results from experimentation in multi-agent cooperative and competitive tasks demonstrate that the capacity-limited approach is a good candidate for improving learning performance in these environments.
arXiv Detail & Related papers (2020-11-23T16:28:27Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.