Related papers: Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning

Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2307.08794v1
Date: Mon, 17 Jul 2023 19:25:46 GMT
Title: Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning
Authors: Patrick Emami, Xiangyu Zhang, David Biagioni, Ahmed S. Zamzam
Abstract summary: In multi-timescale multi-agent reinforcement learning, agents interact across different timescales. We introduce a simple framework for learning non-stationary policies for multi-timescale MARL. The framework's ability to effectively learn multi-timescale policies is validated on a gridworld and building energy management environment.
Score: 9.808555135836022
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In multi-timescale multi-agent reinforcement learning (MARL), agents interact across different timescales. In general, policies for time-dependent behaviors, such as those induced by multiple timescales, are non-stationary. Learning non-stationary policies is challenging and typically requires sophisticated or inefficient algorithms. Motivated by the prevalence of this control problem in real-world complex systems, we introduce a simple framework for learning non-stationary policies for multi-timescale MARL. Our approach uses available information about agent timescales to define a periodic time encoding. In detail, we theoretically demonstrate that the effects of non-stationarity introduced by multiple timescales can be learned by a periodic multi-agent policy. To learn such policies, we propose a policy gradient algorithm that parameterizes the actor and critic with phase-functioned neural networks, which provide an inductive bias for periodicity. The framework's ability to effectively learn multi-timescale policies is validated on a gridworld and building energy management environment.

Related papers

Active Fine-Tuning of Multi-Task Policies [54.65568433408307]
We propose AMF (Active Multi-task Fine-tuning) to maximize multi-task policy performance under a limited demonstration budget.<n>We derive performance guarantees for AMF under regularity assumptions and demonstrate its empirical effectiveness in complex and high-dimensional environments.
arXiv Detail & Related papers (2024-10-07T13:26:36Z)
Temporal Abstraction in Reinforcement Learning with Offline Data [8.370420807869321]
We propose a framework by which an online hierarchical reinforcement learning algorithm can be trained on an offline dataset of transitions collected by an unknown behavior policy. We validate our method on Gym MuJoCo environments and robotic gripper block-stacking tasks in the standard as well as transfer and goal-conditioned settings.
arXiv Detail & Related papers (2024-07-21T18:10:31Z)
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts [42.57662196581823]
Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge. Existing works often overlook the distribution discrepancies induced by policy or dynamics shifts, or rely on specialized algorithms with task priors. In this paper, we identify a unified strategy for online RL policy learning under diverse settings of policy and dynamics shifts: transition occupancy matching.
arXiv Detail & Related papers (2024-05-29T13:36:36Z)
Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents. Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z)
Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles [83.85151306138007]
Multi-level Actor-Critic (MAC) framework incorporates a Multi-level Monte-Carlo (MLMC) estimator. We demonstrate that MAC outperforms the existing state-of-the-art policy gradient-based method for average reward settings.
arXiv Detail & Related papers (2024-03-18T16:23:47Z)
Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning. We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z)
Learning From Good Trajectories in Offline Multi-Agent Reinforcement Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets. One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team. We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z)
Doubly Inhomogeneous Reinforcement Learning [4.334006170547247]
We propose an original algorithm to determine the best data chunks" that display similar dynamics over time and across individuals for policy learning. Our method is general, and works with a wide range of clustering and change point detection algorithms.
arXiv Detail & Related papers (2022-11-08T03:41:14Z)
Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments. We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z)
Consolidation via Policy Information Regularization in Deep RL for Multi-Agent Games [21.46148507577606]
This paper introduces an information-theoretic constraint on learned policy complexity in the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) reinforcement learning algorithm. Results from experimentation in multi-agent cooperative and competitive tasks demonstrate that the capacity-limited approach is a good candidate for improving learning performance in these environments.
arXiv Detail & Related papers (2020-11-23T16:28:27Z)
A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning [13.733491423871383]
We develop a framework for solving multi-task reinforcement learning problems. The goal is to learn a common policy that operates effectively in different environments. We highlight two fundamental challenges in MTRL that are not present in its single task counterpart.
arXiv Detail & Related papers (2020-06-08T03:28:19Z)
Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments. We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.