Optimistic critics can empower small actors
- URL: http://arxiv.org/abs/2506.01016v2
- Date: Wed, 04 Jun 2025 15:56:22 GMT
- Title: Optimistic critics can empower small actors
- Authors: Olya Mastikhina, Dhruv Sreenivas, Pablo Samuel Castro,
- Abstract summary: We argue for the advantages of asymmetric setups, specifically with the use of smaller actors.<n>We find that, in general, smaller actors result in performance degradation and overfit critics.<n>Our analyses suggest poor data collection, due to value underestimation, as one of the main causes for this behavior.
- Score: 14.058002772699044
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Actor-critic methods have been central to many of the recent advances in deep reinforcement learning. The most common approach is to use symmetric architectures, whereby both actor and critic have the same network topology and number of parameters. However, recent works have argued for the advantages of asymmetric setups, specifically with the use of smaller actors. We perform broad empirical investigations and analyses to better understand the implications of this and find that, in general, smaller actors result in performance degradation and overfit critics. Our analyses suggest poor data collection, due to value underestimation, as one of the main causes for this behavior, and further highlight the crucial role the critic can play in alleviating this pathology. We explore techniques to mitigate the observed value underestimation, which enables further research in asymmetric actor-critic methods.
Related papers
- Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning [27.2866735011598]
We study whether the actor and critic will benefit from separate, rather than shared, representations.<n>Our primary finding is that when separated, the representations for the actor and critic systematically specialise in extracting different types of information.<n>We conduct a rigourous empirical study to understand how different representation learning approaches affect the actor and critic's specialisations.
arXiv Detail & Related papers (2025-03-08T21:29:20Z) - LLM-Safety Evaluations Lack Robustness [58.334290876531036]
We argue that current safety alignment research efforts for large language models are hindered by many intertwined sources of noise.<n>We propose a set of guidelines for reducing noise and bias in evaluations of future attack and defense papers.
arXiv Detail & Related papers (2025-03-04T12:55:07Z) - Adversarial Alignment for LLMs Requires Simpler, Reproducible, and More Measurable Objectives [52.863024096759816]
Misaligned research objectives have hindered progress in adversarial robustness research over the past decade.<n>We argue that realigned objectives are necessary for meaningful progress in adversarial alignment.
arXiv Detail & Related papers (2025-02-17T15:28:40Z) - RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques [59.861013614500024]
We introduce a new benchmark designed to assess the critique capabilities of Large Language Models (LLMs)<n>Unlike existing benchmarks, which typically function in an open-loop fashion, our approach employs a closed-loop methodology that evaluates the quality of corrections generated from critiques.
arXiv Detail & Related papers (2025-01-24T13:48:10Z) - Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic [48.94340387130627]
Critic-CoT is a framework that pushes LLMs toward System-2-like critic capability.
CoT reasoning paradigm and the automatic construction of distant-supervision data without human annotation.
Experiments on GSM8K and MATH demonstrate that our enhanced model significantly boosts task-solving performance.
arXiv Detail & Related papers (2024-08-29T08:02:09Z) - Decision-Aware Actor-Critic with Function Approximation and Theoretical
Guarantees [12.259191000019033]
Actor-critic (AC) methods are widely used in reinforcement learning (RL)
We design a joint objective for training the actor and critic in a decision-aware fashion.
We empirically demonstrate the benefit of our decision-aware actor-critic framework on simple RL problems.
arXiv Detail & Related papers (2023-05-24T15:34:21Z) - Actor Prioritized Experience Replay [0.0]
Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error.
We introduce a novel experience replay sampling framework for actor-critic methods, which also regards issues with stability and recent findings behind the poor empirical performance of PER.
An extensive set of experiments verifies our theoretical claims and demonstrates that the introduced method significantly outperforms the competing approaches.
arXiv Detail & Related papers (2022-09-01T15:27:46Z) - Unbiased Asymmetric Actor-Critic for Partially Observable Reinforcement
Learning [17.48572546628464]
Asymmetric actor-critic methods exploit such information by training a history-based policy via a state-based critic.
We examine the theory of asymmetric actor-critic methods which use state-based critics, and expose fundamental issues which undermine the validity of a common variant.
We propose an unbiased asymmetric actor-critic variant which is able to exploit state information while remaining theoretically sound.
arXiv Detail & Related papers (2021-05-25T05:18:44Z) - Good Actors can come in Smaller Sizes: A Case Study on the Value of
Actor-Critic Asymmetry [47.312768123967025]
This case study explores the performance impact of network sizes when considering actor and critic architectures independently.
By relaxing the assumption of architectural symmetry, it is often possible for smaller actors to achieve comparable policy performance to their symmetric counterparts.
arXiv Detail & Related papers (2021-02-23T19:07:47Z) - Benchmarking Adversarial Robustness [47.168521143464545]
We establish a comprehensive, rigorous, and coherent benchmark to evaluate adversarial robustness on image classification tasks.
Based on the evaluation results, we draw several important findings and provide insights for future research.
arXiv Detail & Related papers (2019-12-26T12:37:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.