Related papers: Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors

Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors

URL: http://arxiv.org/abs/2406.07848v1
Date: Wed, 12 Jun 2024 03:30:10 GMT
Title: Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors
Authors: Zhenglong Luo, Zhiyong Chen, James Welsh,
Abstract summary: This paper proposes a deep Q-networks (DQN) algorithm capable of learning various Q-vectors using Max, Nash, and Maximin strategies. The effectiveness of this approach is demonstrated in an environment where dual robotic arms collaborate to lift a pot.
Score: 3.9801926395657325
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-agent reinforcement learning (MARL) has become a significant research topic due to its ability to facilitate learning in complex environments. In multi-agent tasks, the state-action value, commonly referred to as the Q-value, can vary among agents because of their individual rewards, resulting in a Q-vector. Determining an optimal policy is challenging, as it involves more than just maximizing a single Q-value. Various optimal policies, such as a Nash equilibrium, have been studied in this context. Algorithms like Nash Q-learning and Nash Actor-Critic have shown effectiveness in these scenarios. This paper extends this research by proposing a deep Q-networks (DQN) algorithm capable of learning various Q-vectors using Max, Nash, and Maximin strategies. The effectiveness of this approach is demonstrated in an environment where dual robotic arms collaborate to lift a pot.

Related papers

Reinforcing Question Answering Agents with Minimalist Policy Gradient Optimization [80.09112808413133]
Mujica is a planner that decomposes questions into acyclic graph of subquestions and a worker that resolves questions via retrieval and reasoning.<n>MyGO is a novel reinforcement learning method that replaces traditional policy updates with gradient Likelihood Maximum Estimation.<n> Empirical results across multiple datasets demonstrate the effectiveness of MujicaMyGO in enhancing multi-hop QA performance.
arXiv Detail & Related papers (2025-05-20T18:33:03Z)
Multi-Agent Inverse Q-Learning from Demonstrations [3.4136908117644698]
Multi-Agent Marginal Q-Learning from Demonstrations (MAMQL) is a novel sample-efficient framework for multi-agent IRL. We show MAMQL significantly outperforms previous multi-agent methods in average reward, sample efficiency, and reward recovery by often more than 2-5x.
arXiv Detail & Related papers (2025-03-06T18:22:29Z)
Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer [62.01554688056335]
Overestimation in the multiagent setting has received comparatively little attention. We propose a novel hypernet regularizer on hypernetwork weights and biases to constrain the optimization of online global Q-network to prevent overestimation accumulation.
arXiv Detail & Related papers (2025-02-04T05:14:58Z)
Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets) Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z)
Inverse Factorized Q-Learning for Cooperative Multi-agent Imitation Learning [13.060023718506917]
imitation learning (IL) is a problem of learning to mimic expert behaviors from demonstrations in cooperative multi-agent systems. We introduce a novel multi-agent IL algorithm designed to address these challenges. Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions.
arXiv Detail & Related papers (2023-10-10T17:11:20Z)
MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning [63.46052494151171]
We propose textitmulti-agent alternate Q-learning (MA2QL), where agents take turns to update their Q-functions by Q-learning. We prove that when each agent guarantees a $varepsilon$-convergence at each turn, their joint policy converges to a Nash equilibrium. Results show MA2QL consistently outperforms IQL, which verifies the effectiveness of MA2QL, despite such minimal changes.
arXiv Detail & Related papers (2022-09-17T04:54:32Z)
Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation. We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional. We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z)
Sampling Efficient Deep Reinforcement Learning through Preference-Guided Stochastic Exploration [8.612437964299414]
We propose a preference-guided $epsilon$-greedy exploration algorithm for Deep Q-network (DQN) We show that preference-guided exploration motivates the DQN agent to take diverse actions, i.e., actions with larger Q-values can be sampled more frequently whereas actions with smaller Q-values still have a chance to be explored, thus encouraging the exploration.
arXiv Detail & Related papers (2022-06-20T08:23:49Z)
Collective eXplainable AI: Explaining Cooperative Strategies and Agent Contribution in Multiagent Reinforcement Learning with Shapley Values [68.8204255655161]
This study proposes a novel approach to explain cooperative strategies in multiagent RL using Shapley values. Results could have implications for non-discriminatory decision making, ethical and responsible AI-derived decisions or policy making under fairness constraints.
arXiv Detail & Related papers (2021-10-04T10:28:57Z)
Towards Multi-Agent Reinforcement Learning using Quantum Boltzmann Machines [2.015864965523243]
We propose an extension to the original concept in order to solve more challenging problems. We add an experience replay buffer and use different networks for approximating the target and policy values. Quantum sampling proves to be a promising method for reinforcement learning tasks, but is currently limited by the QPU size.
arXiv Detail & Related papers (2021-09-22T17:59:24Z)
Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning [0.0]
We introduce a training method for parametrized quantum circuits (PQCs) that can be used to solve RL tasks for discrete and continuous state spaces. We investigate which architectural choices for quantum Q-learning agents are most important for successfully solving certain types of environments.
arXiv Detail & Related papers (2021-03-28T08:57:22Z)
MAGNet: Multi-agent Graph Network for Deep Multi-agent Reinforcement Learning [70.540936204654]
We propose a novel approach, called MAGnet, to multi-agent reinforcement learning. We show that it significantly outperforms state-of-the-art MARL solutions.
arXiv Detail & Related papers (2020-12-17T17:19:36Z)
Deep Q-Network Based Multi-agent Reinforcement Learning with Binary Action Agents [1.8782750537161614]
Deep Q-Network (DQN) based multi-agent systems (MAS) for reinforcement learning (RL) use various schemes where in the agents have to learn and communicate. We propose a simple but efficient DQN based MAS for RL which uses shared state and rewards, but agent-specific actions. The benefits of the approach are overall simplicity, faster convergence and better performance as compared to conventional DQN based approaches.
arXiv Detail & Related papers (2020-08-06T15:16:05Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.