Related papers: Represented Value Function Approach for Large Scale Multi Agent Reinforcement Learning

Represented Value Function Approach for Large Scale Multi Agent Reinforcement Learning

URL: http://arxiv.org/abs/2001.01096v2
Date: Fri, 10 Jan 2020 01:57:34 GMT
Title: Represented Value Function Approach for Large Scale Multi Agent Reinforcement Learning
Authors: Weiya Ren
Abstract summary: We study the representation problem of the pairwise value function to reduce the complexity of the interactions among agents. We adopt a l2-norm trick to ensure the trivial term of the approximated value function is bounded.
Score: 0.30458514384586394
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we consider the problem of large scale multi agent reinforcement learning. Firstly, we studied the representation problem of the pairwise value function to reduce the complexity of the interactions among agents. Secondly, we adopt a l2-norm trick to ensure the trivial term of the approximated value function is bounded. Thirdly, experimental results on battle game demonstrate the effectiveness of the proposed approach.

Related papers

Low-rank Prompt Interaction for Continual Vision-Language Retrieval [47.323830129786145]
We propose the Low-rank Prompt Interaction to address the problem of multi-modal understanding. Considering that the training parameters scale to the number of layers and tasks, we propose low-rank interaction-augmented decomposition. We also adopt hierarchical low-rank contrastive learning to ensure robustness training.
arXiv Detail & Related papers (2025-01-24T10:00:47Z)
UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours. We focus on the case of linear utility functions parameterised by weight vectors w. We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z)
Neural Amortized Inference for Nested Multi-agent Reasoning [54.39127942041582]
We propose a novel approach to bridge the gap between human-like inference capabilities and computational limitations. We evaluate our method in two challenging multi-agent interaction domains.
arXiv Detail & Related papers (2023-08-21T22:40:36Z)
Adaptive Value Decomposition with Greedy Marginal Contribution Computation for Cooperative Multi-Agent Reinforcement Learning [48.41925886860991]
Real-world cooperation often requires intensive coordination among agents simultaneously. Traditional methods that learn the value function as a monotonic mixing of per-agent utilities cannot solve the tasks with non-monotonic returns. We propose a novel explicit credit assignment method to address the non-monotonic problem.
arXiv Detail & Related papers (2023-02-14T07:23:59Z)
SA-MATD3:Self-attention-based multi-agent continuous control method in cooperative environments [12.959163198988536]
Existing algorithms suffer from the problem of uneven learning degree with the increase of the number of agents. A new structure for a multi-agent actor critic is proposed, and the self-attention mechanism is applied in the critic network. The proposed algorithm makes full use of the samples in the replay memory buffer to learn the behavior of a class of agents.
arXiv Detail & Related papers (2021-07-01T08:15:05Z)
Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning [92.05556163518999]
MARL exacerbates matters by imposing various constraints on communication and observability. For value-based methods, it poses challenges in accurately representing the optimal value function. For policy gradient methods, it makes training the critic difficult and exacerbates the problem of the lagging critic. We show that from a learning theory perspective, both problems can be addressed by accurately representing the associated action-value function.
arXiv Detail & Related papers (2021-05-31T23:08:05Z)
Softmax with Regularization: Better Value Estimation in Multi-Agent Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning. We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline. We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z)
Learning to Represent Action Values as a Hypergraph on the Action Vertices [17.811355496708728]
Action-value estimation is a critical component of reinforcement learning (RL) methods. We conjecture that leveraging the structure of multi-dimensional action spaces is a key ingredient for learning good representations of action. We show the effectiveness of our approach on a myriad of domains: illustrative prediction problems under minimal confounding effects, Atari 2600 games, and discretised physical control benchmarks.
arXiv Detail & Related papers (2020-10-28T00:19:13Z)
Byzantine Resilient Distributed Multi-Task Learning [6.850757447639822]
We show that distributed algorithms for learning relatedness among tasks are not resilient in the presence of Byzantine agents. We propose an approach for Byzantine resilient distributed multi-task learning.
arXiv Detail & Related papers (2020-10-25T04:32:52Z)
Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks. In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other. This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.