Non-Stationary Policy Learning for Multi-Timescale Multi-Agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2307.08794v1
- Date: Mon, 17 Jul 2023 19:25:46 GMT
- Title: Non-Stationary Policy Learning for Multi-Timescale Multi-Agent
Reinforcement Learning
- Authors: Patrick Emami, Xiangyu Zhang, David Biagioni, Ahmed S. Zamzam
- Abstract summary: In multi-timescale multi-agent reinforcement learning, agents interact across different timescales.
We introduce a simple framework for learning non-stationary policies for multi-timescale MARL.
The framework's ability to effectively learn multi-timescale policies is validated on a gridworld and building energy management environment.
- Score: 9.808555135836022
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In multi-timescale multi-agent reinforcement learning (MARL), agents interact
across different timescales. In general, policies for time-dependent behaviors,
such as those induced by multiple timescales, are non-stationary. Learning
non-stationary policies is challenging and typically requires sophisticated or
inefficient algorithms. Motivated by the prevalence of this control problem in
real-world complex systems, we introduce a simple framework for learning
non-stationary policies for multi-timescale MARL. Our approach uses available
information about agent timescales to define a periodic time encoding. In
detail, we theoretically demonstrate that the effects of non-stationarity
introduced by multiple timescales can be learned by a periodic multi-agent
policy. To learn such policies, we propose a policy gradient algorithm that
parameterizes the actor and critic with phase-functioned neural networks, which
provide an inductive bias for periodicity. The framework's ability to
effectively learn multi-timescale policies is validated on a gridworld and
building energy management environment.
Related papers
- Temporal Abstraction in Reinforcement Learning with Offline Data [8.370420807869321]
We propose a framework by which an online hierarchical reinforcement learning algorithm can be trained on an offline dataset of transitions collected by an unknown behavior policy.
We validate our method on Gym MuJoCo environments and robotic gripper block-stacking tasks in the standard as well as transfer and goal-conditioned settings.
arXiv Detail & Related papers (2024-07-21T18:10:31Z) - OMPO: A Unified Framework for RL under Policy and Dynamics Shifts [42.57662196581823]
Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge.
Existing works often overlook the distribution discrepancies induced by policy or dynamics shifts, or rely on specialized algorithms with task priors.
In this paper, we identify a unified strategy for online RL policy learning under diverse settings of policy and dynamics shifts: transition occupancy matching.
arXiv Detail & Related papers (2024-05-29T13:36:36Z) - Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents.
Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z) - Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles [83.85151306138007]
Multi-level Actor-Critic (MAC) framework incorporates a Multi-level Monte-Carlo (MLMC) estimator.
We demonstrate that MAC outperforms the existing state-of-the-art policy gradient-based method for average reward settings.
arXiv Detail & Related papers (2024-03-18T16:23:47Z) - Discovering How Agents Learn Using Few Data [32.38609641970052]
We propose a theoretical and algorithmic framework for real-time identification of agent behavior using a short burst of a single system trajectory.
Our approach accurately recovers the true dynamics across various benchmarks, including equilibrium selection and prediction of chaotic systems up to 10 Lynov times.
These findings suggest that our approach has significant potential to support effective policy and decision-making in strategic multi-agent systems.
arXiv Detail & Related papers (2023-07-13T09:14:48Z) - Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning.
We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - Consolidation via Policy Information Regularization in Deep RL for
Multi-Agent Games [21.46148507577606]
This paper introduces an information-theoretic constraint on learned policy complexity in the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) reinforcement learning algorithm.
Results from experimentation in multi-agent cooperative and competitive tasks demonstrate that the capacity-limited approach is a good candidate for improving learning performance in these environments.
arXiv Detail & Related papers (2020-11-23T16:28:27Z) - A Decentralized Policy Gradient Approach to Multi-task Reinforcement
Learning [13.733491423871383]
We develop a framework for solving multi-task reinforcement learning problems.
The goal is to learn a common policy that operates effectively in different environments.
We highlight two fundamental challenges in MTRL that are not present in its single task counterpart.
arXiv Detail & Related papers (2020-06-08T03:28:19Z) - Learning Adaptive Exploration Strategies in Dynamic Environments Through
Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments.
We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.