Analysis of a Target-Based Actor-Critic Algorithm with Linear Function
Approximation
- URL: http://arxiv.org/abs/2106.07472v1
- Date: Mon, 14 Jun 2021 14:59:05 GMT
- Title: Analysis of a Target-Based Actor-Critic Algorithm with Linear Function
Approximation
- Authors: Anas Barakat, Pascal Bianchi, Julien Lehmann
- Abstract summary: Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning.
We bridge this gap by proposing the first theoretical analysis of an online target-based actor-critic with linear function approximation in the discounted reward setting.
- Score: 2.1592777170316366
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Actor-critic methods integrating target networks have exhibited a stupendous
empirical success in deep reinforcement learning. However, a theoretical
understanding of the use of target networks in actor-critic methods is largely
missing in the literature. In this paper, we bridge this gap between theory and
practice by proposing the first theoretical analysis of an online target-based
actor-critic algorithm with linear function approximation in the discounted
reward setting. Our algorithm uses three different timescales: one for the
actor and two for the critic. Instead of using the standard single timescale
temporal difference (TD) learning algorithm as a critic, we use a two
timescales target-based version of TD learning closely inspired from practical
actor-critic algorithms implementing target networks. First, we establish
asymptotic convergence results for both the critic and the actor under
Markovian sampling. Then, we provide a finite-time analysis showing the impact
of incorporating a target network into actor-critic methods.
Related papers
- Decision-Aware Actor-Critic with Function Approximation and Theoretical
Guarantees [12.259191000019033]
Actor-critic (AC) methods are widely used in reinforcement learning (RL)
We design a joint objective for training the actor and critic in a decision-aware fashion.
We empirically demonstrate the benefit of our decision-aware actor-critic framework on simple RL problems.
arXiv Detail & Related papers (2023-05-24T15:34:21Z) - PAC-Bayesian Soft Actor-Critic Learning [9.752336113724928]
Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators.
We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm.
arXiv Detail & Related papers (2023-01-30T10:44:15Z) - Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework [2.6477113498726244]
We propose actor-director-critic, a new framework for deep reinforcement learning.
For the two critic networks used, we design two target critic networks for each critic network instead of one.
In order to verify the performance of the actor-director-critic framework and the improved double estimator method, we applied them to the TD3 algorithm.
arXiv Detail & Related papers (2023-01-10T10:21:32Z) - Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic [137.04558017227583]
Actor-critic (AC) algorithms, empowered by neural networks, have had significant empirical success in recent years.
We take a mean-field perspective on the evolution and convergence of feature-based neural AC.
We prove that neural AC finds the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2021-12-27T06:09:50Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - Breaking the Deadly Triad with a Target Network [80.82586530205776]
The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously.
We provide the first convergent linear $Q$-learning algorithms under nonrestrictive and changing behavior policies without bi-level optimization.
arXiv Detail & Related papers (2021-01-21T21:50:10Z) - Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy [122.01837436087516]
We study the global convergence and global optimality of actor-critic, one of the most popular families of reinforcement learning algorithms.
We establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time.
arXiv Detail & Related papers (2020-08-02T14:01:49Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z) - A Finite Time Analysis of Two Time-Scale Actor Critic Methods [87.69128666220016]
We provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i.i.d. setting.
We prove that the actor-critic method is guaranteed to find a first-order stationary point.
This is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.
arXiv Detail & Related papers (2020-05-04T09:45:18Z) - How to Learn a Useful Critic? Model-based Action-Gradient-Estimator
Policy Optimization [10.424426548124696]
We propose MAGE, a model-based actor-critic algorithm, grounded in the theory of policy gradients.
MAGE backpropagates through the learned dynamics to compute gradient targets in temporal difference learning.
We demonstrate the efficiency of the algorithm in comparison to model-free and model-based state-of-the-art baselines.
arXiv Detail & Related papers (2020-04-29T16:30:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.