Related papers: Common Information based Approximate State Representations in Multi-Agent Reinforcement Learning

Common Information based Approximate State Representations in Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2110.12603v1
Date: Mon, 25 Oct 2021 02:32:06 GMT
Title: Common Information based Approximate State Representations in Multi-Agent Reinforcement Learning
Authors: Hsu Kao, Vijay Subramanian
Abstract summary: We develop a general compression framework with approximate common and private state representations, based on which decentralized policies can be constructed. The results shed light on designing practically useful deep-MARL network structures under the "centralized learning distributed execution" scheme.
Score: 3.086462790971422
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Due to information asymmetry, finding optimal policies for Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) is hard with the complexity growing doubly exponentially in the horizon length. The challenge increases greatly in the multi-agent reinforcement learning (MARL) setting where the transition probabilities, observation kernel, and reward function are unknown. Here, we develop a general compression framework with approximate common and private state representations, based on which decentralized policies can be constructed. We derive the optimality gap of executing dynamic programming (DP) with the approximate states in terms of the approximation error parameters and the remaining time steps. When the compression is exact (no error), the resulting DP is equivalent to the one in existing work. Our general framework generalizes a number of methods proposed in the literature. The results shed light on designing practically useful deep-MARL network structures under the "centralized learning distributed execution" scheme.

Related papers

Learning Branching Policies for MILPs with Proximal Policy Optimization [0.0]
Branch-and-Bound (B&B) is the dominant exact solution method for Mixed Linear Programs (MILP)<n>Current approaches rely on Imitation Learning (IL), which tends to overfit to expert demonstrations and struggles to generalize to structurally diverse or unseen instances.<n>In this work, we propose Tree-Gate Proximal Policy Optimization, a novel framework that employs Proximal Policy Optimization (PPO), a Reinforcement Learning (RL) algorithm, to train a branching policy.
arXiv Detail & Related papers (2025-11-17T05:16:14Z)
Structured Cooperative Multi-Agent Reinforcement Learning: a Bayesian Network Perspective [1.2515675707300356]
We propose a systematic approach to leverage structures in the inter-agent couplings for efficient model-free reinforcement learning.<n>We derive a multi-agent policy gradient theorem based on the P-DTDE scheme and develop a scalable actor-critic algorithm.
arXiv Detail & Related papers (2025-10-11T00:29:55Z)
Efficient Solving of Large Single Input Superstate Decomposable Markovian Decision Process [1.17431678544333]
A key step in Bellman dynamic programming algorithms is the policy evaluation.<n>We develop an exact and efficient policy evaluation method based on this structure.<n>This yields a scalable solution applicable to both average and discounted reward MDPs.
arXiv Detail & Related papers (2025-08-01T17:49:27Z)
Non-stationary Reinforcement Learning under General Function Approximation [60.430936031067006]
We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA. We show that SW-OPEA is provably efficient as long as the variation budget is not significantly large.
arXiv Detail & Related papers (2023-06-01T16:19:37Z)
Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization [64.60253456266872]
Markov decision processes (MDPs) aim to handle changing or partially known system dynamics. Regularized MDPs show more stability in policy learning without impairing time complexity. Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
arXiv Detail & Related papers (2023-03-12T13:03:28Z)
First-order Policy Optimization for Robust Markov Decision Process [40.2022466644885]
We consider the problem of solving robust Markov decision process (MDP) MDP involves a set of discounted, finite state, finite action space MDPs with uncertain transition kernels. For $(mathbfs,mathbfa)$-rectangular uncertainty sets, we establish several structural observations on the robust objective.
arXiv Detail & Related papers (2022-09-21T18:10:28Z)
Towards Global Optimality in Cooperative MARL with the Transformation And Distillation Framework [26.612749327414335]
Decentralized execution is one core demand in cooperative multi-agent reinforcement learning (MARL) In this paper, we theoretically analyze two common classes of algorithms with decentralized policies -- multi-agent policy gradient methods and value-decomposition methods. We show that TAD-PPO can theoretically perform optimal policy learning in the finite multi-agent MDPs and shows significant outperformance on a large set of cooperative multi-agent tasks.
arXiv Detail & Related papers (2022-07-12T06:59:13Z)
Robustness and risk management via distributional dynamic programming [13.173307471333619]
We introduce a new class of distributional operators, together with a practical DP algorithm for policy evaluation. Our approach reformulates through an augmented state space where each state is split into a worst-case substate and a best-case substate. We derive distributional operators and DP algorithms solving a new control task.
arXiv Detail & Related papers (2021-12-28T12:12:57Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs) Semi-implicit actor (SIA) powered by a flexible policy distribution. We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
Multi-Agent Reinforcement Learning in Stochastic Networked Systems [30.78949372661673]
We study multi-agent reinforcement learning (MARL) in a network of agents. The objective is to find localized policies that maximize the (discounted) global reward.
arXiv Detail & Related papers (2020-06-11T16:08:16Z)
F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications. We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting. Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search. We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.