Beyond Exponentially Fast Mixing in Average-Reward Reinforcement
Learning via Multi-Level Monte Carlo Actor-Critic
- URL: http://arxiv.org/abs/2301.12083v2
- Date: Wed, 1 Feb 2023 18:55:32 GMT
- Title: Beyond Exponentially Fast Mixing in Average-Reward Reinforcement
Learning via Multi-Level Monte Carlo Actor-Critic
- Authors: Wesley A. Suttle, Amrit Singh Bedi, Bhrij Patel, Brian M. Sadler, Alec
Koppel, Dinesh Manocha
- Abstract summary: We propose an RL methodology attuned to the mixing time by employing a multi-level Monte Carlo estimator for the critic, the actor, and the average reward embedded within an actor-critic (AC) algorithm.
We experimentally show that these alleviated restrictions on the technical conditions required for stability translate to superior performance in practice for RL problems with sparse rewards.
- Score: 61.968469104271676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many existing reinforcement learning (RL) methods employ stochastic gradient
iteration on the back end, whose stability hinges upon a hypothesis that the
data-generating process mixes exponentially fast with a rate parameter that
appears in the step-size selection. Unfortunately, this assumption is violated
for large state spaces or settings with sparse rewards, and the mixing time is
unknown, making the step size inoperable. In this work, we propose an RL
methodology attuned to the mixing time by employing a multi-level Monte Carlo
estimator for the critic, the actor, and the average reward embedded within an
actor-critic (AC) algorithm. This method, which we call \textbf{M}ulti-level
\textbf{A}ctor-\textbf{C}ritic (MAC), is developed especially for
infinite-horizon average-reward settings and neither relies on oracle knowledge
of the mixing time in its parameter selection nor assumes its exponential
decay; it, therefore, is readily applicable to applications with slower mixing
times. Nonetheless, it achieves a convergence rate comparable to the
state-of-the-art AC algorithms. We experimentally show that these alleviated
restrictions on the technical conditions required for stability translate to
superior performance in practice for RL problems with sparse rewards.
Related papers
- MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z) - Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach [33.38582292895673]
We propose a Natural Actor-Critic with order-optimal regret of $tildeO(sqrtT)$ in infinite-reward average-reward Decision Processes.<n> NACB employs function approximation for both actor and the critic, enabling scalability to large state potential periodicity and action spaces.
arXiv Detail & Related papers (2025-05-26T13:43:02Z) - A Sharper Global Convergence Analysis for Average Reward Reinforcement Learning via an Actor-Critic Approach [31.343919501963416]
This work examines average-reward reinforcement learning with general policy parametrization.<n>We propose a Multi-level Monte Carlo-based Natural Actor-Critic (MLMC-NAC) algorithm.<n>Our work is the first to achieve a global convergence rate of $tildemathcalO (1/sqrtT)$ for average-reward Markov Decision Processes.
arXiv Detail & Related papers (2024-07-26T17:16:31Z) - Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles [83.85151306138007]
Multi-level Actor-Critic (MAC) framework incorporates a Multi-level Monte-Carlo (MLMC) estimator.
We demonstrate that MAC outperforms the existing state-of-the-art policy gradient-based method for average reward settings.
arXiv Detail & Related papers (2024-03-18T16:23:47Z) - Fast Semisupervised Unmixing Using Nonconvex Optimization [80.11512905623417]
We introduce a novel convex convex model for semi/library-based unmixing.
We demonstrate the efficacy of Alternating Methods of sparse unsupervised unmixing.
arXiv Detail & Related papers (2024-01-23T10:07:41Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Adapting to Mixing Time in Stochastic Optimization with Markovian Data [12.709177728330399]
We consider optimization problems where data is drawn from a Markov chain.
Existing methods for this setting crucially rely on knowing the mixing time of the chain.
Our method relies on a novel combination of multi-level Monte Carlo (ML) gradient estimation together with an adaptive learning method.
arXiv Detail & Related papers (2022-02-09T12:43:11Z) - Continual Learning In Environments With Polynomial Mixing Times [13.533984338434106]
We study the effect of mixing times on learning in continual reinforcement learning.
We propose a family of model-based algorithms that speed up learning by directly optimizing for the average reward.
arXiv Detail & Related papers (2021-12-13T23:41:56Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Multi-agent Reinforcement Learning Accelerated MCMC on Multiscale
Inversion Problem [0.0]
We propose a multi-agent actor-critic reinforcement learning (RL) algorithm to accelerate the multi-level Monte Carlo Markov Chain (MCMC) sampling algorithms.
The policies (actors) of the agents are used to generate the proposal in the MCMC steps; and the critic, which is centralized, is in charge of estimating the long term reward.
Our experiments show that the proposed method significantly improves the sampling process.
arXiv Detail & Related papers (2020-11-17T21:25:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.