A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms
- URL: http://arxiv.org/abs/2010.01069v4
- Date: Wed, 26 Jan 2022 18:07:28 GMT
- Title: A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms
- Authors: Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson,
Remi Tachet des Combes
- Abstract summary: We investigate the discounting mismatch in actor-critic algorithm implementations from a representation learning perspective.
Theoretically, actor-critic algorithms usually have discounting for both actor and critic, i.e., there is a $gammat$ term in the actor update for the transition observed at time $t$ in a trajectory.
Practitioners, however, usually ignore the discounting ($gammat$) for the actor while using a discounted critic.
- Score: 81.01917016753644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the discounting mismatch in actor-critic algorithm
implementations from a representation learning perspective. Theoretically,
actor-critic algorithms usually have discounting for both actor and critic,
i.e., there is a $\gamma^t$ term in the actor update for the transition
observed at time $t$ in a trajectory and the critic is a discounted value
function. Practitioners, however, usually ignore the discounting ($\gamma^t$)
for the actor while using a discounted critic. We investigate this mismatch in
two scenarios. In the first scenario, we consider optimizing an undiscounted
objective $(\gamma = 1)$ where $\gamma^t$ disappears naturally $(1^t = 1)$. We
then propose to interpret the discounting in critic in terms of a
bias-variance-representation trade-off and provide supporting empirical
results. In the second scenario, we consider optimizing a discounted objective
($\gamma < 1$) and propose to interpret the omission of the discounting in the
actor update from an auxiliary task perspective and provide supporting
empirical results.
Related papers
- Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation.
We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Two-Timescale Critic-Actor for Average Reward MDPs with Function Approximation [5.945710235932345]
We present the first two-timescale critic-actor algorithm with function approximation in the long-run average reward setting.
A notable feature of our analysis is that unlike recent single-timescale actor-critic algorithms, we present a complete convergence analysis of our scheme.
arXiv Detail & Related papers (2024-02-02T12:48:49Z) - Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms [5.945710235932345]
We consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes.
We carry out a non-asymptotic analysis for both of these algorithms in a non-i.i.d (Markovian) setting.
We also show the results of experiments on three different Safety-Gym environments.
arXiv Detail & Related papers (2023-10-25T05:04:00Z) - Finite-time analysis of single-timescale actor-critic [8.994243376183658]
Actor-critic methods have achieved significant success in many challenging applications.
finite-time convergence is still poorly understood in the most practical single-timescale form.
We investigate the more practical online single-timescale actor-critic algorithm on continuous state space.
arXiv Detail & Related papers (2022-10-18T15:03:56Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm [21.91930554261688]
Actor-critic style two-time-scale algorithms are very popular in reinforcement learning.
In this paper, we characterize the global convergence of an online natural actor-critic algorithm.
We employ $epsilon$-greedy sampling in order to ensure enough exploration.
arXiv Detail & Related papers (2021-01-26T01:12:07Z) - Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy [122.01837436087516]
We study the global convergence and global optimality of actor-critic, one of the most popular families of reinforcement learning algorithms.
We establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time.
arXiv Detail & Related papers (2020-08-02T14:01:49Z) - A Finite Time Analysis of Two Time-Scale Actor Critic Methods [87.69128666220016]
We provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i.i.d. setting.
We prove that the actor-critic method is guaranteed to find a first-order stationary point.
This is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.
arXiv Detail & Related papers (2020-05-04T09:45:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.