Weakly Coupled Deep Q-Networks
- URL: http://arxiv.org/abs/2310.18803v1
- Date: Sat, 28 Oct 2023 20:07:57 GMT
- Title: Weakly Coupled Deep Q-Networks
- Authors: Ibrahim El Shar, Daniel R. Jiang
- Abstract summary: We propose a novel deep reinforcement learning algorithm that enhances performance in weakly coupled Markov decision processes (WCMDP)
WCDQN employs a single network to train multiple DQN "subagents", one for each subproblem, and then combine their solutions to establish an upper bound on the optimal action value.
- Score: 5.76924666595801
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose weakly coupled deep Q-networks (WCDQN), a novel deep reinforcement
learning algorithm that enhances performance in a class of structured problems
called weakly coupled Markov decision processes (WCMDP). WCMDPs consist of
multiple independent subproblems connected by an action space constraint, which
is a structural property that frequently emerges in practice. Despite this
appealing structure, WCMDPs quickly become intractable as the number of
subproblems grows. WCDQN employs a single network to train multiple DQN
"subagents", one for each subproblem, and then combine their solutions to
establish an upper bound on the optimal action value. This guides the main DQN
agent towards optimality. We show that the tabular version, weakly coupled
Q-learning (WCQL), converges almost surely to the optimal action value.
Numerical experiments show faster convergence compared to DQN and related
techniques in settings with as many as 10 subproblems, $3^{10}$ total actions,
and a continuous state space.
Related papers
- Differentiation Through Black-Box Quadratic Programming Solvers [16.543673072027136]
We introduce dQP, a modular framework that enables plug-and-play differentiation for any quadratic programming (QP) solver.
Our solution is based on the core theoretical insight that knowledge of the active constraint set at the QP optimum allows for explicit differentiation.
Our implementation, which will be made publicly available, interfaces with an existing framework that supports over 15 state-of-the-art QP solvers.
arXiv Detail & Related papers (2024-10-08T20:01:39Z) - Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network.
We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems.
We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z) - Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets)
Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z) - On the Convergence and Sample Complexity Analysis of Deep Q-Networks
with $\epsilon$-Greedy Exploration [86.71396285956044]
This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $varepsilon$-greedy exploration in deep reinforcement learning.
arXiv Detail & Related papers (2023-10-24T20:37:02Z) - Q-SHED: Distributed Optimization at the Edge via Hessian Eigenvectors
Quantization [5.404315085380945]
Newton-type (NT) methods have been advocated as enablers of robust convergence rates in DO problems.
We propose Q-SHED, an original NT algorithm for DO featuring a novel bit-allocation scheme based on incremental Hessian eigenvectors quantization.
We show that Q-SHED can reduce by up to 60% the number of communication rounds required for convergence.
arXiv Detail & Related papers (2023-05-18T10:15:03Z) - Residual Q-Networks for Value Function Factorizing in Multi-Agent
Reinforcement Learning [0.0]
We propose a novel concept of Residual Q-Networks (RQNs) for Multi-Agent Reinforcement Learning (MARL)
The RQN learns to transform the individual Q-value trajectories in a way that preserves the Individual-Global-Max criteria (IGM)
The proposed method converges faster, with increased stability and shows robust performance in a wider family of environments.
arXiv Detail & Related papers (2022-05-30T16:56:06Z) - A Convergent and Efficient Deep Q Network Algorithm [3.553493344868414]
We show that the deep Q network (DQN) reinforcement learning algorithm can diverge and cease to operate in realistic settings.
We propose a convergent DQN algorithm (C-DQN) by carefully modifying DQN.
It learns robustly in difficult settings and can learn several difficult games in the Atari 2600 benchmark.
arXiv Detail & Related papers (2021-06-29T13:38:59Z) - Joint Deep Reinforcement Learning and Unfolding: Beam Selection and
Precoding for mmWave Multiuser MIMO with Lens Arrays [54.43962058166702]
millimeter wave (mmWave) multiuser multiple-input multiple-output (MU-MIMO) systems with discrete lens arrays have received great attention.
In this work, we investigate the joint design of a beam precoding matrix for mmWave MU-MIMO systems with DLA.
arXiv Detail & Related papers (2021-01-05T03:55:04Z) - Deep Q-Network Based Multi-agent Reinforcement Learning with Binary
Action Agents [1.8782750537161614]
Deep Q-Network (DQN) based multi-agent systems (MAS) for reinforcement learning (RL) use various schemes where in the agents have to learn and communicate.
We propose a simple but efficient DQN based MAS for RL which uses shared state and rewards, but agent-specific actions.
The benefits of the approach are overall simplicity, faster convergence and better performance as compared to conventional DQN based approaches.
arXiv Detail & Related papers (2020-08-06T15:16:05Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal
Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination.
We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.