Related papers: Recurrent Natural Policy Gradient for POMDPs

Recurrent Natural Policy Gradient for POMDPs

URL: http://arxiv.org/abs/2405.18221v1
Date: Tue, 28 May 2024 14:29:31 GMT
Title: Recurrent Natural Policy Gradient for POMDPs
Authors: Semih Cayci, Atilla Eryilmaz,
Abstract summary: We present a natural policy gradient method based on recurrent neural networks (RNNs) for partially-observable Markov decision processes. Our analysis demonstrates the efficiency of RNNs for problems with short-term memory with explicit bounds on the required network widths and sample complexity.
Score: 16.893624100273108
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study a natural policy gradient method based on recurrent neural networks (RNNs) for partially-observable Markov decision processes, whereby RNNs are used for policy parameterization and policy evaluation to address curse of dimensionality in non-Markovian reinforcement learning. We present finite-time and finite-width analyses for both the critic (recurrent temporal difference learning), and correspondingly-operated recurrent natural policy gradient method in the near-initialization regime. Our analysis demonstrates the efficiency of RNNs for problems with short-term memory with explicit bounds on the required network widths and sample complexity, and points out the challenges in the case of long-term dependencies.

Related papers

Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes [59.27926064817273]
We introduce an exploration-agnostic algorithm, called C-PG, which enjoys global last-iterate convergence guarantees under domination assumptions.<n>We empirically validate both the action-based (C-PGAE) and parameter-based (C-PGPE) variants of C-PG on constrained control tasks.
arXiv Detail & Related papers (2025-06-06T10:29:05Z)
Reinforcement Learning with Continuous Actions Under Unmeasured Confounding [14.510042451844766]
This paper addresses the challenge of offline policy learning in reinforcement learning with continuous action spaces. We develop a minimax estimator and introduce a policy-gradient-based algorithm to identify the in-class optimal policy. We provide theoretical results regarding the consistency, finite-sample error bound, and regret bound of the resulting optimal policy.
arXiv Detail & Related papers (2025-05-01T04:55:29Z)
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions. We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z)
Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning [69.00997996453842]
We propose a deep Reinforcement Learning approach to learn a joint Admission Control and Resource Allocation policy for virtual network embedding. We show that HRL-ACRA outperforms state-of-the-art baselines in terms of both the acceptance ratio and long-term average revenue.
arXiv Detail & Related papers (2024-06-25T07:42:30Z)
Intelligent Hybrid Resource Allocation in MEC-assisted RAN Slicing Network [72.2456220035229]
We aim to maximize the SSR for heterogeneous service demands in the cooperative MEC-assisted RAN slicing system. We propose a recurrent graph reinforcement learning (RGRL) algorithm to intelligently learn the optimal hybrid RA policy.
arXiv Detail & Related papers (2024-05-02T01:36:13Z)
High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms. We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z)
Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm. We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z)
Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic Algorithm [29.978816372127085]
We present a finite-time analysis of Natural actor-critic (NAC) with neural network approximation. We identify the roles of neural networks, regularization and optimization techniques to achieve provably good performance.
arXiv Detail & Related papers (2022-06-02T02:13:29Z)
Occupancy Information Ratio: Infinite-Horizon, Information-Directed, Parameterized Policy Search [21.850348833971722]
We propose an information-directed objective for infinite-horizon reinforcement learning (RL) called the occupancy information ratio (OIR) The OIR enjoys rich underlying structure and presents an objective to which scalable, model-free policy search methods naturally apply. We show by leveraging connections between quasiconcave optimization and the linear programming theory for Markov decision processes that the OIR problem can be transformed and solved via concave programming methods when the underlying model is known.
arXiv Detail & Related papers (2022-01-21T18:40:03Z)
On Finite-Sample Analysis of Offline Reinforcement Learning with Deep ReLU Networks [46.067702683141356]
We study the statistical theory of offline reinforcement learning with deep ReLU networks. We quantify how the distribution shift of the offline data, the dimension of the input space, and the regularity of the system control the OPE estimation error.
arXiv Detail & Related papers (2021-03-11T14:01:14Z)
Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces. The proposed method explores local properties of policy behavior to identify unexpected decisions. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z)
A Study of Policy Gradient on a Class of Exactly Solvable Models [35.90565839381652]
We explore the evolution of the policy parameters, for a special class of exactly solvable POMDPs, as a continuous-state Markov chain. Our approach relies heavily on random walk theory, specifically on affine Weyl groups. We analyze the probabilistic convergence of policy gradient to different local maxima of the value function.
arXiv Detail & Related papers (2020-11-03T17:27:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.