Actor-Critic without Actor
- URL: http://arxiv.org/abs/2509.21022v1
- Date: Thu, 25 Sep 2025 11:33:09 GMT
- Title: Actor-Critic without Actor
- Authors: Donghyeon Ki, Hee-Jun Ahn, Kyungyoon Kim, Byung-Jun Lee,
- Abstract summary: We introduce Actor-Critic without Actor (ACA), a lightweight framework that eliminates the explicit actor network and instead generates actions directly from the field of a noise-level critic.<n>ACA achieves more favorable learning curves and competitive performance compared to both standard actor-critic and state-of-the-art diffusion-based methods.
- Score: 4.94481688445056
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Actor-critic methods constitute a central paradigm in reinforcement learning (RL), coupling policy evaluation with policy improvement. While effective across many domains, these methods rely on separate actor and critic networks, which makes training vulnerable to architectural decisions and hyperparameter tuning. Such complexity limits their scalability in settings that require large function approximators. Recently, diffusion models have recently been proposed as expressive policies that capture multi-modal behaviors and improve exploration, but they introduce additional design choices and computational burdens, hindering efficient deployment. We introduce Actor-Critic without Actor (ACA), a lightweight framework that eliminates the explicit actor network and instead generates actions directly from the gradient field of a noise-level critic. This design removes the algorithmic and computational overhead of actor training while keeping policy improvement tightly aligned with the critic's latest value estimates. Moreover, ACA retains the ability to capture diverse, multi-modal behaviors without relying on diffusion-based actors, combining simplicity with expressiveness. Through extensive experiments on standard online RL benchmarks,ACA achieves more favorable learning curves and competitive performance compared to both standard actor-critic and state-of-the-art diffusion-based methods, providing a simple yet powerful solution for online RL.
Related papers
- Q-learning with Adjoint Matching [58.78551025170267]
We propose Q-learning with Adjoint Matching (QAM), a novel TD-based reinforcement learning (RL) algorithm.<n>QAM sidesteps two challenges by leveraging adjoint matching, a recently proposed technique in generative modeling.<n>It consistently outperforms prior approaches on hard, sparse reward tasks in both offline and offline-to-online RL.
arXiv Detail & Related papers (2026-01-20T18:45:34Z) - Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization [24.229494482432376]
In this paper, we reformulate existing intractable multimodal actors within a unified framework.<n>We then propose a distance-based diversity regularization that does not explicitly require decision probabilities.<n>Our experiments highlight that the amortized actor is a promising policy model class with strong multimodal expressivity and high performance.
arXiv Detail & Related papers (2025-11-03T09:17:53Z) - One-Step Flow Policy Mirror Descent [52.31612487608593]
Flow Policy Mirror Descent (FPMD) is an online RL algorithm that enables 1-step sampling during flow policy inference.<n>Our approach exploits a theoretical connection between the distribution variance and the discretization error of single-step sampling in straight flow matching models.
arXiv Detail & Related papers (2025-07-31T15:51:10Z) - Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning [0.0]
We propose a generative policy trained with an augmented flow-matching objective to predict direct completion vectors from intermediate flow samples.<n>Our method scales effectively to offline, offline-to-online, and online RL settings, offering substantial gains in speed and adaptability.<n>We extend SSCP to goal-conditioned RL, enabling flat policies to exploit subgoal structures without explicit hierarchical inference.
arXiv Detail & Related papers (2025-06-26T16:09:53Z) - Supervised Learning-enhanced Multi-Group Actor Critic for Live Stream Allocation in Feed [14.545253604335823]
We propose a novel Supervised Learning-enhanced Multi-Group Actor Critic algorithm (SL-MGAC)<n>We introduce a supervised learning-enhanced actor-critic framework that incorporates variance reduction techniques.<n>We also propose a novel reward function to prevent overly greedy live stream allocation.
arXiv Detail & Related papers (2024-11-28T04:06:02Z) - Learning a Diffusion Model Policy from Rewards via Q-Score Matching [93.0191910132874]
We present a theoretical framework linking the structure of diffusion model policies to a learned Q-function.<n>We propose a new policy update method from this theory, which we denote Q-score matching.
arXiv Detail & Related papers (2023-12-18T23:31:01Z) - Decision-Aware Actor-Critic with Function Approximation and Theoretical
Guarantees [12.259191000019033]
Actor-critic (AC) methods are widely used in reinforcement learning (RL)
We design a joint objective for training the actor and critic in a decision-aware fashion.
We empirically demonstrate the benefit of our decision-aware actor-critic framework on simple RL problems.
arXiv Detail & Related papers (2023-05-24T15:34:21Z) - Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods.
By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z) - Actor Prioritized Experience Replay [0.0]
Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error.
We introduce a novel experience replay sampling framework for actor-critic methods, which also regards issues with stability and recent findings behind the poor empirical performance of PER.
An extensive set of experiments verifies our theoretical claims and demonstrates that the introduced method significantly outperforms the competing approaches.
arXiv Detail & Related papers (2022-09-01T15:27:46Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - TASAC: a twin-actor reinforcement learning framework with stochastic
policy for batch process control [1.101002667958165]
Reinforcement Learning (RL) wherein an agent learns the policy by directly interacting with the environment, offers a potential alternative in this context.
RL frameworks with actor-critic architecture have recently become popular for controlling systems where state and action spaces are continuous.
It has been shown that an ensemble of actor and critic networks further helps the agent learn better policies due to the enhanced exploration due to simultaneous policy learning.
arXiv Detail & Related papers (2022-04-22T13:00:51Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.