Related papers: A Theoretical Justification for Asymmetric Actor-Critic Algorithms

A Theoretical Justification for Asymmetric Actor-Critic Algorithms

URL: http://arxiv.org/abs/2501.19116v1
Date: Fri, 31 Jan 2025 13:20:05 GMT
Title: A Theoretical Justification for Asymmetric Actor-Critic Algorithms
Authors: Gaspard Lambrechts, Damien Ernst, Aditya Mahajan,
Abstract summary: We propose a justification for asymmetric actor-critic algorithms with linear function approximators.<n>The resulting finite-time bound reveals that the asymmetric critic eliminates an error term arising from aliasing in the agent state.
Score: 2.9071167862893605
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In reinforcement learning for partially observable environments, many successful algorithms were developed within the asymmetric learning paradigm. This paradigm leverages additional state information available at training time for faster learning. Although the proposed learning objectives are usually theoretically sound, these methods still lack a theoretical justification for their potential benefits. We propose such a justification for asymmetric actor-critic algorithms with linear function approximators by adapting a finite-time convergence analysis to this setting. The resulting finite-time bound reveals that the asymmetric critic eliminates an error term arising from aliasing in the agent state.

Related papers

Certified algorithms for quantum Hamiltonian learning via energy-entropy inequalities [9.349653765341301]
We consider the problem of learning the Hamiltonian of a quantum system from estimates of Gibbs-state expectation values. We build on this work by extending it to provide certified a posteriori lower and upper bounds on the parameters to be learned.
arXiv Detail & Related papers (2024-10-30T17:58:52Z)
The Stochastic Proximal Distance Algorithm [5.3315823983402755]
We propose and analyze a class of iterative optimization methods that recover a desired constrained estimation problem as a penalty parameter. We extend recent theoretical devices to establish finite error bounds and a complete characterization of convergence rates. We validate our analysis via a thorough empirical study, also showing that unsurprisingly, the proposed method outpaces batch versions on popular learning tasks.
arXiv Detail & Related papers (2022-10-21T22:07:28Z)
Spectral Decomposition Representation for Reinforcement Learning [100.0424588013549]
We propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy. A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings. An experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.
arXiv Detail & Related papers (2022-08-19T19:01:30Z)
On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI) Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation. We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z)
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z)
Neural Network Training with Asymmetric Crosspoint Elements [1.0773924713784704]
asymmetric conductance modulation of practical resistive devices critically degrades the classification of networks trained with conventional algorithms. Here, we describe and experimentally demonstrate an alternative fully-parallel training algorithm: Hamiltonian Descent. We provide critical intuition on why device asymmetry is fundamentally incompatible with conventional training algorithms and how the new approach exploits it as a useful feature instead.
arXiv Detail & Related papers (2022-01-31T17:41:36Z)
Unbiased Asymmetric Actor-Critic for Partially Observable Reinforcement Learning [17.48572546628464]
Asymmetric actor-critic methods exploit such information by training a history-based policy via a state-based critic. We examine the theory of asymmetric actor-critic methods which use state-based critics, and expose fundamental issues which undermine the validity of a common variant. We propose an unbiased asymmetric actor-critic variant which is able to exploit state information while remaining theoretically sound.
arXiv Detail & Related papers (2021-05-25T05:18:44Z)
Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics. Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis. We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z)
A Dynamical Systems Approach for Convergence of the Bayesian EM Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based. The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM) We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z)
Study of Diffusion Normalized Least Mean M-estimate Algorithms [0.8749675983608171]
This work proposes diffusion normalized least mean M-estimate algorithm based on the modified Huber function. We analyze the transient, steady-state and stability behaviors of the algorithms in a unified framework. Simulations in various impulsive noise scenarios show that the proposed algorithms are superior to some existing diffusion algorithms.
arXiv Detail & Related papers (2020-04-20T00:28:41Z)
Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory. Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.