Related papers: Deep Q-Network Based Multi-agent Reinforcement Learning with Binary Action Agents

Deep Q-Network Based Multi-agent Reinforcement Learning with Binary Action Agents

URL: http://arxiv.org/abs/2008.04109v1
Date: Thu, 6 Aug 2020 15:16:05 GMT
Title: Deep Q-Network Based Multi-agent Reinforcement Learning with Binary Action Agents
Authors: Abdul Mueed Hafiz and Ghulam Mohiuddin Bhat
Abstract summary: Deep Q-Network (DQN) based multi-agent systems (MAS) for reinforcement learning (RL) use various schemes where in the agents have to learn and communicate. We propose a simple but efficient DQN based MAS for RL which uses shared state and rewards, but agent-specific actions. The benefits of the approach are overall simplicity, faster convergence and better performance as compared to conventional DQN based approaches.
Score: 1.8782750537161614
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Q-Network (DQN) based multi-agent systems (MAS) for reinforcement learning (RL) use various schemes where in the agents have to learn and communicate. The learning is however specific to each agent and communication may be satisfactorily designed for the agents. As more complex Deep QNetworks come to the fore, the overall complexity of the multi-agent system increases leading to issues like difficulty in training, need for higher resources and more training time, difficulty in fine-tuning, etc. To address these issues we propose a simple but efficient DQN based MAS for RL which uses shared state and rewards, but agent-specific actions, for updation of the experience replay pool of the DQNs, where each agent is a DQN. The benefits of the approach are overall simplicity, faster convergence and better performance as compared to conventional DQN based approaches. It should be noted that the method can be extended to any DQN. As such we use simple DQN and DDQN (Double Q-learning) respectively on three separate tasks i.e. Cartpole-v1 (OpenAI Gym environment) , LunarLander-v2 (OpenAI Gym environment) and Maze Traversal (customized environment). The proposed approach outperforms the baseline on these tasks by decent margins respectively.

Related papers

Double Q-learning for Value-based Deep Reinforcement Learning, Revisited [24.236175931106978]
Double Q-learning is an algorithm introduced to address Q-learning's overestimation.<n>We study algorithms that adapt this core idea of Double Q-learning for value-based deep RL.
arXiv Detail & Related papers (2025-06-30T21:32:46Z)
Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer [62.01554688056335]
Overestimation in the multiagent setting has received comparatively little attention. We propose a novel hypernet regularizer on hypernetwork weights and biases to constrain the optimization of online global Q-network to prevent overestimation accumulation.
arXiv Detail & Related papers (2025-02-04T05:14:58Z)
Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors [3.9801926395657325]
This paper proposes a deep Q-networks (DQN) algorithm capable of learning various Q-vectors using Max, Nash, and Maximin strategies. The effectiveness of this approach is demonstrated in an environment where dual robotic arms collaborate to lift a pot.
arXiv Detail & Related papers (2024-06-12T03:30:10Z)
Weakly Coupled Deep Q-Networks [5.76924666595801]
We propose a novel deep reinforcement learning algorithm that enhances performance in weakly coupled Markov decision processes (WCMDP) WCDQN employs a single network to train multiple DQN "subagents", one for each subproblem, and then combine their solutions to establish an upper bound on the optimal action value.
arXiv Detail & Related papers (2023-10-28T20:07:57Z)
Multi-Agent Reinforcement Learning with Action Masking for UAV-enabled Mobile Communications [1.3053649021965603]
Unmanned Aerial Vehicles (UAVs) are increasingly used as aerial base stations to provide ad hoc communications infrastructure. This paper focuses on the use of multiple UAVs for providing wireless communication to mobile users in the absence of terrestrial communications infrastructure. We jointly optimize UAV 3D trajectory and NOMA power allocation to maximize system throughput.
arXiv Detail & Related papers (2023-03-29T14:41:03Z)
MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning [63.46052494151171]
We propose textitmulti-agent alternate Q-learning (MA2QL), where agents take turns to update their Q-functions by Q-learning. We prove that when each agent guarantees a $varepsilon$-convergence at each turn, their joint policy converges to a Nash equilibrium. Results show MA2QL consistently outperforms IQL, which verifies the effectiveness of MA2QL, despite such minimal changes.
arXiv Detail & Related papers (2022-09-17T04:54:32Z)
M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network [6.689964384669018]
We propose a framework which uses the Max-Mean loss in Deep Q-Network (M$2$DQN) Instead of sampling one batch of experiences in the training step, we sample several batches from the experience replay and update the parameters such as the maximum TD-error of these batches is minimized. We verify the effectiveness of this framework with one of the most widely used techniques, Double DQN (DDQN) in several gym games.
arXiv Detail & Related papers (2022-09-16T09:20:35Z)
Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context. We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z)
Deep Reinforcement Learning with Spiking Q-learning [51.386945803485084]
spiking neural networks (SNNs) are expected to realize artificial intelligence (AI) with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combining SNNs with deep reinforcement learning (RL)
arXiv Detail & Related papers (2022-01-21T16:42:11Z)
Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge. CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
Efficient Ridesharing Dispatch Using Multi-Agent Reinforcement Learning [0.0]
Ride-sharing services such as Uber and Lyft offer a service where passengers can order a car to pick them up. Traditional Reinforcement Learning (RL) based methods attempting to solve the ridesharing problem are unable to accurately model the complex environment in which taxis operate. We show that our model performs better than the IDQN baseline on a fixed grid size and is able to generalize well to smaller or larger grid sizes. Our algorithm is able to outperform IDQN baseline in the scenario where we have a variable number of passengers and cars in each episode.
arXiv Detail & Related papers (2020-06-18T23:37:53Z)
Distributed Reinforcement Learning for Cooperative Multi-Robot Object Manipulation [53.262360083572005]
We consider solving a cooperative multi-robot object manipulation task using reinforcement learning (RL) We propose two distributed multi-agent RL approaches: distributed approximate RL (DA-RL) and game-theoretic RL (GT-RL) Although we focus on a small system of two agents in this paper, both DA-RL and GT-RL apply to general multi-agent systems, and are expected to scale well to large systems.
arXiv Detail & Related papers (2020-03-21T00:43:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.