Related papers: Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation

Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation

URL: http://arxiv.org/abs/2106.07472v1
Date: Mon, 14 Jun 2021 14:59:05 GMT
Title: Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation
Authors: Anas Barakat, Pascal Bianchi, Julien Lehmann
Abstract summary: Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning. We bridge this gap by proposing the first theoretical analysis of an online target-based actor-critic with linear function approximation in the discounted reward setting.
Score: 2.1592777170316366
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning. However, a theoretical understanding of the use of target networks in actor-critic methods is largely missing in the literature. In this paper, we bridge this gap between theory and practice by proposing the first theoretical analysis of an online target-based actor-critic algorithm with linear function approximation in the discounted reward setting. Our algorithm uses three different timescales: one for the actor and two for the critic. Instead of using the standard single timescale temporal difference (TD) learning algorithm as a critic, we use a two timescales target-based version of TD learning closely inspired from practical actor-critic algorithms implementing target networks. First, we establish asymptotic convergence results for both the critic and the actor under Markovian sampling. Then, we provide a finite-time analysis showing the impact of incorporating a target network into actor-critic methods.

Related papers

Building Bridges between Regression, Clustering, and Classification [5.78009645672281]
We propose a new method to improve the training of models on regression tasks, with continuous scalar targets. Our method is based on casting this task in a different fashion, using a target encoder, and a prediction decoder, inspired by approaches in classification and clustering.
arXiv Detail & Related papers (2025-02-05T08:45:00Z)
Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees [12.259191000019033]
Actor-critic (AC) methods are widely used in reinforcement learning (RL) We design a joint objective for training the actor and critic in a decision-aware fashion. We empirically demonstrate the benefit of our decision-aware actor-critic framework on simple RL problems.
arXiv Detail & Related papers (2023-05-24T15:34:21Z)
PAC-Bayesian Soft Actor-Critic Learning [9.752336113724928]
Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators. We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm.
arXiv Detail & Related papers (2023-01-30T10:44:15Z)
Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework [2.6477113498726244]
We propose actor-director-critic, a new framework for deep reinforcement learning. For the two critic networks used, we design two target critic networks for each critic network instead of one. In order to verify the performance of the actor-director-critic framework and the improved double estimator method, we applied them to the TD3 algorithm.
arXiv Detail & Related papers (2023-01-10T10:21:32Z)
Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic [137.04558017227583]
Actor-critic (AC) algorithms, empowered by neural networks, have had significant empirical success in recent years. We take a mean-field perspective on the evolution and convergence of feature-based neural AC. We prove that neural AC finds the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2021-12-27T06:09:50Z)
Local Critic Training for Model-Parallel Learning of Deep Neural Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training. We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z)
Breaking the Deadly Triad with a Target Network [80.82586530205776]
The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously. We provide the first convergent linear $Q$-learning algorithms under nonrestrictive and changing behavior policies without bi-level optimization.
arXiv Detail & Related papers (2021-01-21T21:50:10Z)
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy [122.01837436087516]
We study the global convergence and global optimality of actor-critic, one of the most popular families of reinforcement learning algorithms. We establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time.
arXiv Detail & Related papers (2020-08-02T14:01:49Z)
Unsupervised Learning of Video Representations via Dense Trajectory Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos. We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation. We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z)
A Finite Time Analysis of Two Time-Scale Actor Critic Methods [87.69128666220016]
We provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find a first-order stationary point. This is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.
arXiv Detail & Related papers (2020-05-04T09:45:18Z)
How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization [10.424426548124696]
We propose MAGE, a model-based actor-critic algorithm, grounded in the theory of policy gradients. MAGE backpropagates through the learned dynamics to compute gradient targets in temporal difference learning. We demonstrate the efficiency of the algorithm in comparison to model-free and model-based state-of-the-art baselines.
arXiv Detail & Related papers (2020-04-29T16:30:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.