Related papers: An Empirical Evaluation of Posterior Sampling for Constrained Reinforcement Learning

An Empirical Evaluation of Posterior Sampling for Constrained Reinforcement Learning

URL: http://arxiv.org/abs/2209.03596v1
Date: Thu, 8 Sep 2022 06:52:49 GMT
Title: An Empirical Evaluation of Posterior Sampling for Constrained Reinforcement Learning
Authors: Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
Abstract summary: We study a posterior sampling approach to efficient exploration in constrained reinforcement learning. We propose two simple algorithms that are more efficient statistically, simpler to implement and computationally cheaper.
Score: 7.3449418475577595
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study a posterior sampling approach to efficient exploration in constrained reinforcement learning. Alternatively to existing algorithms, we propose two simple algorithms that are more efficient statistically, simpler to implement and computationally cheaper. The first algorithm is based on a linear formulation of CMDP, and the second algorithm leverages the saddle-point formulation of CMDP. Our empirical results demonstrate that, despite its simplicity, posterior sampling achieves state-of-the-art performance and, in some cases, significantly outperforms optimistic algorithms.

Related papers

Cost Function Estimation Using Inverse Reinforcement Learning with Minimal Observations [13.08316935335288]
We present an iterative inverse reinforcement learning algorithm to infer optimal cost functions in continuous spaces.<n>Our algorithm can individually tune the effectiveness of each observation for the partition function and does not need a large sample set.
arXiv Detail & Related papers (2025-05-13T14:38:25Z)
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment. We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z)
PBES: PCA Based Exemplar Sampling Algorithm for Continual Learning [0.0]
We propose a novel exemplar selection approach based on Principal Component Analysis (PCA) and median sampling, and a neural network training regime in the setting of class-incremental learning. This approach avoids the pitfalls due to outliers in the data and is both simple to implement and use across various incremental machine learning models.
arXiv Detail & Related papers (2023-12-14T21:27:38Z)
Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement Learning [53.445068584013896]
We study matrix estimation problems arising in reinforcement learning (RL) with low-rank structure. In low-rank bandits, the matrix to be recovered specifies the expected arm rewards, and for low-rank Markov Decision Processes (MDPs), it may for example characterize the transition kernel of the MDP. We show that simple spectral-based matrix estimation approaches efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error.
arXiv Detail & Related papers (2023-10-10T17:06:41Z)
Deep Unrolling for Nonconvex Robust Principal Component Analysis [75.32013242448151]
We design algorithms for Robust Component Analysis (A) It consists in decomposing a matrix into the sum of a low Principaled matrix and a sparse Principaled matrix.
arXiv Detail & Related papers (2023-07-12T03:48:26Z)
Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation. Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions. We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z)
Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity. We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z)
Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation [11.345796608258434]
We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes. We analyze two algorithms; the first algorithm involves a least squares approach where a new set of weights associated with feature vectors is obtained via at least squares at each iteration. The second algorithm involves a two-time-scale approximation algorithm taking several steps of gradient descent towards the least squares solution.
arXiv Detail & Related papers (2022-10-13T20:16:19Z)
Optimizing Objective Functions from Trained ReLU Neural Networks via Sampling [0.0]
This paper introduces scalable, sampling-based algorithms that optimize trained neural networks with ReLU activations. We first propose an iterative algorithm that takes advantage of the piecewise linear structure of ReLU neural networks. We then extend this approach by searching around the neighborhood of the LP solution computed at each iteration.
arXiv Detail & Related papers (2022-05-27T18:35:48Z)
Dual Optimization for Kolmogorov Model Learning Using Enhanced Gradient Descent [8.714458129632158]
Kolmogorov model (KM) is an interpretable and predictable representation approach to learning the underlying probabilistic structure of a set of random variables. We propose a computationally scalable KM learning algorithm, based on the regularized dual optimization combined with enhanced gradient descent (GD) method. It is shown that the accuracy of logical relation mining for interpretability by using the proposed KM learning algorithm exceeds $80%$.
arXiv Detail & Related papers (2021-07-11T10:33:02Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.