Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal
Difference and Successor Representation
- URL: http://arxiv.org/abs/2112.15156v1
- Date: Thu, 30 Dec 2021 18:21:53 GMT
- Title: Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal
Difference and Successor Representation
- Authors: Mohammad Salimibeni, Arash Mohammadi, Parvin Malekzadeh, and
Konstantinos N. Plataniotis
- Abstract summary: The paper proposes the Multi-Agent Adaptive Kalman Temporal Difference (MAK-TD) framework and its Successor Representation-based variant, referred to as the MAK-SR.
The proposed MAK-TD/SR frameworks consider the continuous nature of the action-space that is associated with high dimensional multi-agent environments.
- Score: 32.80370188601152
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Distributed Multi-Agent Reinforcement Learning (MARL) algorithms has
attracted a surge of interest lately mainly due to the recent advancements of
Deep Neural Networks (DNNs). Conventional Model-Based (MB) or Model-Free (MF)
RL algorithms are not directly applicable to the MARL problems due to
utilization of a fixed reward model for learning the underlying value function.
While DNN-based solutions perform utterly well when a single agent is involved,
such methods fail to fully generalize to the complexities of MARL problems. In
other words, although recently developed approaches based on DNNs for
multi-agent environments have achieved superior performance, they are still
prone to overfiting, high sensitivity to parameter selection, and sample
inefficiency. The paper proposes the Multi-Agent Adaptive Kalman Temporal
Difference (MAK-TD) framework and its Successor Representation-based variant,
referred to as the MAK-SR. Intuitively speaking, the main objective is to
capitalize on unique characteristics of Kalman Filtering (KF) such as
uncertainty modeling and online second order learning. The proposed MAK-TD/SR
frameworks consider the continuous nature of the action-space that is
associated with high dimensional multi-agent environments and exploit Kalman
Temporal Difference (KTD) to address the parameter uncertainty. By leveraging
the KTD framework, SR learning procedure is modeled into a filtering problem,
where Radial Basis Function (RBF) estimators are used to encode the continuous
space into feature vectors. On the other hand, for learning localized reward
functions, we resort to Multiple Model Adaptive Estimation (MMAE), to deal with
the lack of prior knowledge on the observation noise covariance and observation
mapping function. The proposed MAK-TD/SR frameworks are evaluated via several
experiments, which are implemented through the OpenAI Gym MARL benchmarks.
Related papers
- Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.
We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.
Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning [37.80275600302316]
distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL.
A notorious yet open challenge is if RMGs can escape the curse of multiagency.
This is the first algorithm to break the curse of multiagency for RMGs.
arXiv Detail & Related papers (2024-09-30T08:09:41Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - Fast Value Tracking for Deep Reinforcement Learning [7.648784748888187]
Reinforcement learning (RL) tackles sequential decision-making problems by creating agents that interact with their environment.
Existing algorithms often view these problem as static, focusing on point estimates for model parameters to maximize expected rewards.
Our research leverages the Kalman paradigm to introduce a novel quantification and sampling algorithm called Langevinized Kalman TemporalTD.
arXiv Detail & Related papers (2024-03-19T22:18:19Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - Risk-Aware Distributed Multi-Agent Reinforcement Learning [8.287693091673658]
We develop a distributed MARL approach to solve decision-making problems in unknown environments by learning risk-aware actions.
We then propose a distributed MARL algorithm called the CVaR QD-Learning algorithm, and establish that value functions of individual agents reaches consensus.
arXiv Detail & Related papers (2023-04-04T17:56:44Z) - Multi-Agent Reinforcement Learning for Adaptive Mesh Refinement [17.72127385405445]
We present a novel formulation of adaptive mesh refinement (AMR) as a fully-cooperative Markov game.
We design a novel deep multi-agent reinforcement learning algorithm called Value Decomposition Graph Network (VDGN)
We show that VDGN policies significantly outperform error threshold-based policies in global error and cost metrics.
arXiv Detail & Related papers (2022-11-02T00:41:32Z) - Relational Reasoning via Set Transformers: Provable Efficiency and
Applications to MARL [154.13105285663656]
A cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications.
Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works.
We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents.
arXiv Detail & Related papers (2022-09-20T16:42:59Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - MM-KTD: Multiple Model Kalman Temporal Differences for Reinforcement
Learning [36.14516028564416]
This paper proposes an innovative Multiple Model Kalman Temporal Difference (MM-KTD) framework to learn optimal control policies.
An active learning method is proposed to enhance the sampling efficiency of the system.
Experimental results show superiority of the MM-KTD framework in comparison to its state-of-the-art counterparts.
arXiv Detail & Related papers (2020-05-30T06:39:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.