Related papers: Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy Optimization

Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy Optimization

URL: http://arxiv.org/abs/2402.05476v1
Date: Thu, 8 Feb 2024 08:08:23 GMT
Title: Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy Optimization
Authors: Talha Bozkus and Urbashi Mitra
Abstract summary: Original Q-learning suffers from performance and complexity challenges across very large networks. New model-free ensemble reinforcement learning algorithm which adapts the classical Q-learning is proposed to handle these challenges. Numerical results show that the proposed algorithm can achieve up to 55% less average policy error with up to 50% less runtime complexity.
Score: 21.30645601474163
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) is a classical tool to solve network control or policy optimization problems in unknown environments. The original Q-learning suffers from performance and complexity challenges across very large networks. Herein, a novel model-free ensemble reinforcement learning algorithm which adapts the classical Q-learning is proposed to handle these challenges for networks which admit Markov decision process (MDP) models. Multiple Q-learning algorithms are run on multiple, distinct, synthetically created and structurally related Markovian environments in parallel; the outputs are fused using an adaptive weighting mechanism based on the Jensen-Shannon divergence (JSD) to obtain an approximately optimal policy with low complexity. The theoretical justification of the algorithm, including the convergence of key statistics and Q-functions are provided. Numerical results across several network models show that the proposed algorithm can achieve up to 55% less average policy error with up to 50% less runtime complexity than the state-of-the-art Q-learning algorithms. Numerical results validate assumptions made in the theoretical analysis.

Related papers

Integrating Optimization Theory with Deep Learning for Wireless Network Design [38.257335693563554]
Traditional wireless network design relies on optimization algorithms derived from domain-specific mathematical models. Deep learning has emerged as a promising alternative to overcome complexity and adaptability concerns. This paper introduces a novel approach that integrates optimization theory with deep learning methodologies to address these issues.
arXiv Detail & Related papers (2024-12-11T20:27:48Z)
Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis [30.713243690224207]
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees.
arXiv Detail & Related papers (2024-10-31T16:53:20Z)
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks. We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model. Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z)
Coverage Analysis of Multi-Environment Q-Learning Algorithms for Wireless Network Optimization [18.035417008213077]
Recent advancements include ensemble multi-environment hybrid Q-learning algorithms. We show that our algorithm can achieve %50 less policy error and %40 less runtime complexity than state-of-the-art reinforcement learning algorithms.
arXiv Detail & Related papers (2024-08-29T20:09:20Z)
Hybrid Heuristic Algorithms for Adiabatic Quantum Machine Learning Models [2.7407913606612615]
This paper introduces a novel hybrid algorithm that incorporates an "r-flip" strategy.<n>This strategy is aimed at solving large-scale QUBO problems more effectively, offering better solution quality and lower computational costs.<n>The r-flip approach has practical applications in diverse fields, including cross-docking, supply chain management, machine scheduling, and fraud detection.
arXiv Detail & Related papers (2024-07-26T19:31:58Z)
Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach [4.36117236405564]
Soft Q-learning is a variation of Q-learning designed to solve entropy regularized Markov decision problems. This paper aims to offer a novel and unified finite-time, control-theoretic analysis of soft Q-learning algorithms.
arXiv Detail & Related papers (2024-03-11T01:36:37Z)
Leveraging Digital Cousins for Ensemble Q-Learning in Large-Scale Wireless Networks [21.30645601474163]
A novel ensemble Q-learning algorithm is presented to optimize wireless networks. The proposed algorithm can achieve up to 50% less average error with up to 40% less runtime complexity than the state-of-the-art reinforcement learning algorithms.
arXiv Detail & Related papers (2024-02-12T19:39:07Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets) Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z)
Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML. This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z)
On the Convergence of Distributed Stochastic Bilevel Optimization Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models. Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data. We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
A Hybrid PAC Reinforcement Learning Algorithm [5.279475826661642]
This paper offers a new hybrid probably approximately correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs) The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free and model-based learning approaches while outperforming both in most cases.
arXiv Detail & Related papers (2020-09-05T21:32:42Z)
Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed. An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed. We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.