Robustness to Multi-Modal Environment Uncertainty in MARL using
Curriculum Learning
- URL: http://arxiv.org/abs/2310.08746v1
- Date: Thu, 12 Oct 2023 22:19:36 GMT
- Title: Robustness to Multi-Modal Environment Uncertainty in MARL using
Curriculum Learning
- Authors: Aakriti Agrawal, Rohith Aralikatti, Yanchao Sun, Furong Huang
- Abstract summary: This work is the first to formulate the generalised problem of robustness to multi-modal environment uncertainty in MARL.
We handle two distinct environmental uncertainty simultaneously and present extensive results across both cooperative and competitive MARL environments.
- Score: 35.671725515559054
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-agent reinforcement learning (MARL) plays a pivotal role in tackling
real-world challenges. However, the seamless transition of trained policies
from simulations to real-world requires it to be robust to various
environmental uncertainties. Existing works focus on finding Nash Equilibrium
or the optimal policy under uncertainty in one environment variable (i.e.
action, state or reward). This is because a multi-agent system itself is highly
complex and unstationary. However, in real-world situation uncertainty can
occur in multiple environment variables simultaneously. This work is the first
to formulate the generalised problem of robustness to multi-modal environment
uncertainty in MARL. To this end, we propose a general robust training approach
for multi-modal uncertainty based on curriculum learning techniques. We handle
two distinct environmental uncertainty simultaneously and present extensive
results across both cooperative and competitive MARL environments,
demonstrating that our approach achieves state-of-the-art levels of robustness.
Related papers
- Certifiably Robust Policies for Uncertain Parametric Environments [57.2416302384766]
We propose a framework based on parametric Markov decision processes (MDPs) with unknown distributions over parameters.
We learn and analyse IMDPs for a set of unknown sample environments induced by parameters.
We show that our approach produces tight bounds on a policy's performance with high confidence.
arXiv Detail & Related papers (2024-08-06T10:48:15Z) - Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty [40.55653383218379]
This work focuses on learning in distributionally robust Markov games (RMGs)
We propose a sample-efficient model-based algorithm (DRNVI) with finite-sample complexity guarantees for learning robust variants of various notions of game-theoretic equilibria.
arXiv Detail & Related papers (2024-04-29T17:51:47Z) - Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles [83.85151306138007]
Multi-level Actor-Critic (MAC) framework incorporates a Multi-level Monte-Carlo (MLMC) estimator.
We demonstrate that MAC outperforms the existing state-of-the-art policy gradient-based method for average reward settings.
arXiv Detail & Related papers (2024-03-18T16:23:47Z) - Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov
Decision Processes [5.276882857467777]
We present a search algorithm called textitAdaptive Monte Carlo Tree Search (ADA-MCTS)
We show that the agent can learn the updated dynamics of the environment over time and then act as it learns, i.e., if the agent is in a region of the state space about which it has updated knowledge, it can avoid being pessimistic.
arXiv Detail & Related papers (2024-01-03T17:19:54Z) - Robust Multi-Agent Reinforcement Learning via Adversarial
Regularization: Theoretical Foundation and Stable Algorithms [79.61176746380718]
Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains.
MARL policies often lack robustness and are sensitive to small changes in their environment.
We show that we can gain robustness by controlling a policy's Lipschitz constant.
We propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies.
arXiv Detail & Related papers (2023-10-16T20:14:06Z) - Intrinsically Motivated Hierarchical Policy Learning in Multi-objective
Markov Decision Processes [15.50007257943931]
We propose a novel dual-phase intrinsically motivated reinforcement learning method to address this limitation.
We show experimentally that the proposed method significantly outperforms state-of-the-art multi-objective reinforcement methods in a dynamic robotics environment.
arXiv Detail & Related papers (2023-08-18T02:10:45Z) - Policy Learning for Robust Markov Decision Process with a Mismatched
Generative Model [42.28001762749647]
In high-stake scenarios like medical treatment and auto-piloting, it's risky or even infeasible to collect online experimental data to train the agent.
We consider policy learning for Robust Markov Decision Processes (RMDP), where the agent tries to seek a robust policy with respect to unexpected perturbations on the environments.
Our goal is to identify a near-optimal robust policy for the perturbed testing environment, which introduces additional technical difficulties.
arXiv Detail & Related papers (2022-03-13T06:37:25Z) - Multi-Agent Constrained Policy Optimisation [17.772811770726296]
We formulate the safe MARL problem as a constrained Markov game and solve it with policy optimisation methods.
Our solutions -- Multi-Agent Constrained Policy optimisation (MACPO) and MAPPO-Lagrangian -- leverage the theories from both constrained policy optimisation and multi-agent trust region learning.
We develop the benchmark suite of Safe Multi-Agent MuJoCo that involves a variety of MARL baselines.
arXiv Detail & Related papers (2021-10-06T14:17:09Z) - One Solution is Not All You Need: Few-Shot Extrapolation via Structured
MaxEnt RL [142.36621929739707]
We show that learning diverse behaviors for accomplishing a task can lead to behavior that generalizes to varying environments.
By identifying multiple solutions for the task in a single environment during training, our approach can generalize to new situations.
arXiv Detail & Related papers (2020-10-27T17:41:57Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.