Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy
Optimization
- URL: http://arxiv.org/abs/2402.05476v1
- Date: Thu, 8 Feb 2024 08:08:23 GMT
- Title: Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy
Optimization
- Authors: Talha Bozkus and Urbashi Mitra
- Abstract summary: Original Q-learning suffers from performance and complexity challenges across very large networks.
New model-free ensemble reinforcement learning algorithm which adapts the classical Q-learning is proposed to handle these challenges.
Numerical results show that the proposed algorithm can achieve up to 55% less average policy error with up to 50% less runtime complexity.
- Score: 21.30645601474163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) is a classical tool to solve network control or
policy optimization problems in unknown environments. The original Q-learning
suffers from performance and complexity challenges across very large networks.
Herein, a novel model-free ensemble reinforcement learning algorithm which
adapts the classical Q-learning is proposed to handle these challenges for
networks which admit Markov decision process (MDP) models. Multiple Q-learning
algorithms are run on multiple, distinct, synthetically created and
structurally related Markovian environments in parallel; the outputs are fused
using an adaptive weighting mechanism based on the Jensen-Shannon divergence
(JSD) to obtain an approximately optimal policy with low complexity. The
theoretical justification of the algorithm, including the convergence of key
statistics and Q-functions are provided. Numerical results across several
network models show that the proposed algorithm can achieve up to 55% less
average policy error with up to 50% less runtime complexity than the
state-of-the-art Q-learning algorithms. Numerical results validate assumptions
made in the theoretical analysis.
Related papers
- Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis [30.713243690224207]
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes.
This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees.
arXiv Detail & Related papers (2024-10-31T16:53:20Z) - Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.
We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.
Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - Coverage Analysis of Multi-Environment Q-Learning Algorithms for Wireless Network Optimization [18.035417008213077]
Recent advancements include ensemble multi-environment hybrid Q-learning algorithms.
We show that our algorithm can achieve %50 less policy error and %40 less runtime complexity than state-of-the-art reinforcement learning algorithms.
arXiv Detail & Related papers (2024-08-29T20:09:20Z) - Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach [4.36117236405564]
Soft Q-learning is a variation of Q-learning designed to solve entropy regularized Markov decision problems.
This paper aims to offer a novel and unified finite-time, control-theoretic analysis of soft Q-learning algorithms.
arXiv Detail & Related papers (2024-03-11T01:36:37Z) - Leveraging Digital Cousins for Ensemble Q-Learning in Large-Scale
Wireless Networks [21.30645601474163]
A novel ensemble Q-learning algorithm is presented to optimize wireless networks.
The proposed algorithm can achieve up to 50% less average error with up to 40% less runtime complexity than the state-of-the-art reinforcement learning algorithms.
arXiv Detail & Related papers (2024-02-12T19:39:07Z) - Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets)
Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - On the Convergence of Distributed Stochastic Bilevel Optimization
Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models.
Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data.
We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - A Hybrid PAC Reinforcement Learning Algorithm [5.279475826661642]
This paper offers a new hybrid probably approximately correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs)
The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free and model-based learning approaches while outperforming both in most cases.
arXiv Detail & Related papers (2020-09-05T21:32:42Z) - Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding
Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed.
An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed.
We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.