Good Actors can come in Smaller Sizes: A Case Study on the Value of
Actor-Critic Asymmetry
- URL: http://arxiv.org/abs/2102.11893v1
- Date: Tue, 23 Feb 2021 19:07:47 GMT
- Title: Good Actors can come in Smaller Sizes: A Case Study on the Value of
Actor-Critic Asymmetry
- Authors: Siddharth Mysore, Bassel Mabsout, Renato Mancuso, Kate Saenko
- Abstract summary: This case study explores the performance impact of network sizes when considering actor and critic architectures independently.
By relaxing the assumption of architectural symmetry, it is often possible for smaller actors to achieve comparable policy performance to their symmetric counterparts.
- Score: 47.312768123967025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Actors and critics in actor-critic reinforcement learning algorithms are
functionally separate, yet they often use the same network architectures. This
case study explores the performance impact of network sizes when considering
actor and critic architectures independently. By relaxing the assumption of
architectural symmetry, it is often possible for smaller actors to achieve
comparable policy performance to their symmetric counterparts. Our experiments
show up to 97% reduction in the number of network weights with an average
reduction of 64% over multiple algorithms on multiple tasks. Given the
practical benefits of reducing actor complexity, we believe configurations of
actors and critics are aspects of actor-critic design that deserve to be
considered independently.
Related papers
- Optimistic critics can empower small actors [14.058002772699044]
We argue for the advantages of asymmetric setups, specifically with the use of smaller actors.<n>We find that, in general, smaller actors result in performance degradation and overfit critics.<n>Our analyses suggest poor data collection, due to value underestimation, as one of the main causes for this behavior.
arXiv Detail & Related papers (2025-06-01T14:00:03Z) - Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning [27.2866735011598]
We study whether the actor and critic will benefit from separate, rather than shared, representations.
Our primary finding is that when separated, the representations for the actor and critic systematically specialise in extracting different types of information.
We conduct a rigourous empirical study to understand how different representation learning approaches affect the actor and critic's specialisations.
arXiv Detail & Related papers (2025-03-08T21:29:20Z) - SARC: Soft Actor Retrospective Critic [14.775519703997478]
Soft Actor Retrospective Critic (SARC) is an actor-critic algorithm that augments the SAC critic loss with another loss term.
We show that SARC provides consistent improvement over SAC on benchmark environments.
arXiv Detail & Related papers (2023-06-28T18:50:18Z) - Decision-Aware Actor-Critic with Function Approximation and Theoretical
Guarantees [12.259191000019033]
Actor-critic (AC) methods are widely used in reinforcement learning (RL)
We design a joint objective for training the actor and critic in a decision-aware fashion.
We empirically demonstrate the benefit of our decision-aware actor-critic framework on simple RL problems.
arXiv Detail & Related papers (2023-05-24T15:34:21Z) - PAC-Bayesian Soft Actor-Critic Learning [9.752336113724928]
Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators.
We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm.
arXiv Detail & Related papers (2023-01-30T10:44:15Z) - Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework [2.6477113498726244]
We propose actor-director-critic, a new framework for deep reinforcement learning.
For the two critic networks used, we design two target critic networks for each critic network instead of one.
In order to verify the performance of the actor-director-critic framework and the improved double estimator method, we applied them to the TD3 algorithm.
arXiv Detail & Related papers (2023-01-10T10:21:32Z) - Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods.
By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z) - Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic [137.04558017227583]
Actor-critic (AC) algorithms, empowered by neural networks, have had significant empirical success in recent years.
We take a mean-field perspective on the evolution and convergence of feature-based neural AC.
We prove that neural AC finds the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2021-12-27T06:09:50Z) - Stereoscopic Universal Perturbations across Different Architectures and
Datasets [60.021985610201156]
We study the effect of adversarial perturbations of images on deep stereo matching networks for the disparity estimation task.
We present a method to craft a single set of perturbations that, when added to any stereo image pair in a dataset, can fool a stereo network.
Our perturbations can increase D1-error (akin to fooling rate) of state-of-the-art stereo networks from 1% to as much as 87%.
arXiv Detail & Related papers (2021-12-12T02:11:31Z) - Identification of Attack-Specific Signatures in Adversarial Examples [62.17639067715379]
We show that different attack algorithms produce adversarial examples which are distinct not only in their effectiveness but also in how they qualitatively affect their victims.
Our findings suggest that prospective adversarial attacks should be compared not only via their success rates at fooling models but also via deeper downstream effects they have on victims.
arXiv Detail & Related papers (2021-10-13T15:40:48Z) - Analysis of a Target-Based Actor-Critic Algorithm with Linear Function
Approximation [2.1592777170316366]
Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning.
We bridge this gap by proposing the first theoretical analysis of an online target-based actor-critic with linear function approximation in the discounted reward setting.
arXiv Detail & Related papers (2021-06-14T14:59:05Z) - A Finite Time Analysis of Two Time-Scale Actor Critic Methods [87.69128666220016]
We provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i.i.d. setting.
We prove that the actor-critic method is guaranteed to find a first-order stationary point.
This is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.
arXiv Detail & Related papers (2020-05-04T09:45:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.