Beyond Exponentially Fast Mixing in Average-Reward Reinforcement
Learning via Multi-Level Monte Carlo Actor-Critic
- URL: http://arxiv.org/abs/2301.12083v2
- Date: Wed, 1 Feb 2023 18:55:32 GMT
- Title: Beyond Exponentially Fast Mixing in Average-Reward Reinforcement
Learning via Multi-Level Monte Carlo Actor-Critic
- Authors: Wesley A. Suttle, Amrit Singh Bedi, Bhrij Patel, Brian M. Sadler, Alec
Koppel, Dinesh Manocha
- Abstract summary: We propose an RL methodology attuned to the mixing time by employing a multi-level Monte Carlo estimator for the critic, the actor, and the average reward embedded within an actor-critic (AC) algorithm.
We experimentally show that these alleviated restrictions on the technical conditions required for stability translate to superior performance in practice for RL problems with sparse rewards.
- Score: 61.968469104271676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many existing reinforcement learning (RL) methods employ stochastic gradient
iteration on the back end, whose stability hinges upon a hypothesis that the
data-generating process mixes exponentially fast with a rate parameter that
appears in the step-size selection. Unfortunately, this assumption is violated
for large state spaces or settings with sparse rewards, and the mixing time is
unknown, making the step size inoperable. In this work, we propose an RL
methodology attuned to the mixing time by employing a multi-level Monte Carlo
estimator for the critic, the actor, and the average reward embedded within an
actor-critic (AC) algorithm. This method, which we call \textbf{M}ulti-level
\textbf{A}ctor-\textbf{C}ritic (MAC), is developed especially for
infinite-horizon average-reward settings and neither relies on oracle knowledge
of the mixing time in its parameter selection nor assumes its exponential
decay; it, therefore, is readily applicable to applications with slower mixing
times. Nonetheless, it achieves a convergence rate comparable to the
state-of-the-art AC algorithms. We experimentally show that these alleviated
restrictions on the technical conditions required for stability translate to
superior performance in practice for RL problems with sparse rewards.
Related papers
- Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles [83.85151306138007]
Multi-level Actor-Critic (MAC) framework incorporates a Multi-level Monte-Carlo (MLMC) estimator.
We demonstrate that MAC outperforms the existing state-of-the-art policy gradient-based method for average reward settings.
arXiv Detail & Related papers (2024-03-18T16:23:47Z) - Fast Semisupervised Unmixing Using Nonconvex Optimization [80.11512905623417]
We introduce a novel convex convex model for semi/library-based unmixing.
We demonstrate the efficacy of Alternating Methods of sparse unsupervised unmixing.
arXiv Detail & Related papers (2024-01-23T10:07:41Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Adapting to Mixing Time in Stochastic Optimization with Markovian Data [12.709177728330399]
We consider optimization problems where data is drawn from a Markov chain.
Existing methods for this setting crucially rely on knowing the mixing time of the chain.
Our method relies on a novel combination of multi-level Monte Carlo (ML) gradient estimation together with an adaptive learning method.
arXiv Detail & Related papers (2022-02-09T12:43:11Z) - Continual Learning In Environments With Polynomial Mixing Times [13.533984338434106]
We study the effect of mixing times on learning in continual reinforcement learning.
We propose a family of model-based algorithms that speed up learning by directly optimizing for the average reward.
arXiv Detail & Related papers (2021-12-13T23:41:56Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Multi-agent Reinforcement Learning Accelerated MCMC on Multiscale
Inversion Problem [0.0]
We propose a multi-agent actor-critic reinforcement learning (RL) algorithm to accelerate the multi-level Monte Carlo Markov Chain (MCMC) sampling algorithms.
The policies (actors) of the agents are used to generate the proposal in the MCMC steps; and the critic, which is centralized, is in charge of estimating the long term reward.
Our experiments show that the proposed method significantly improves the sampling process.
arXiv Detail & Related papers (2020-11-17T21:25:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.