Related papers: Informed Asymmetric Actor-Critic: Leveraging Privileged Signals Beyond Full-State Access

Informed Asymmetric Actor-Critic: Leveraging Privileged Signals Beyond Full-State Access

URL: http://arxiv.org/abs/2509.26000v1
Date: Tue, 30 Sep 2025 09:32:20 GMT
Title: Informed Asymmetric Actor-Critic: Leveraging Privileged Signals Beyond Full-State Access
Authors: Daniel Ebi, Gaspard Lambrechts, Damien Ernst, Klemens Böhm,
Abstract summary: Reinforcement learning in partially observable environments requires agents to act under uncertainty from noisy, incomplete observations.<n>Existing approaches typically assume full-state access during training.<n>We propose a novel actor-critic framework, called informed asymmetric actor-critic, that enables conditioning the critic on arbitrary privileged signals.
Score: 4.414257584656551
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning in partially observable environments requires agents to act under uncertainty from noisy, incomplete observations. Asymmetric actor-critic methods leverage privileged information during training to improve learning under these conditions. However, existing approaches typically assume full-state access during training. In this work, we challenge this assumption by proposing a novel actor-critic framework, called informed asymmetric actor-critic, that enables conditioning the critic on arbitrary privileged signals without requiring access to the full state. We show that policy gradients remain unbiased under this formulation, extending the theoretical foundation of asymmetric methods to the more general case of privileged partial information. To quantify the impact of such signals, we propose informativeness measures based on kernel methods and return prediction error, providing practical tools for evaluating training-time signals. We validate our approach empirically on benchmark navigation tasks and synthetic partially observable environments, showing that our informed asymmetric method improves learning efficiency and value estimation when informative privileged inputs are available. Our findings challenge the necessity of full-state access and open new directions for designing asymmetric reinforcement learning methods that are both practical and theoretically sound.

Related papers

Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation [0.7583052519127079]
Reinforcement learning methods designed to handle adversarial input observations have received significant attention.<n>We propose a novel off-policy method that eliminates the need for additional environmental interactions.<n>Our approach is theoretically supported by the symmetric property of policy evaluation between the agent and the adversary.
arXiv Detail & Related papers (2025-06-20T05:13:10Z)
Learning Verifiable Control Policies Using Relaxed Verification [49.81690518952909]
This work proposes to perform verification throughout training to aim for policies whose properties can be evaluated throughout runtime.<n>The approach is to use differentiable reachability analysis and incorporate new components into the loss function.
arXiv Detail & Related papers (2025-04-23T16:54:35Z)
A Theoretical Justification for Asymmetric Actor-Critic Algorithms [3.946432657561182]
We propose a justification for asymmetric actor-critic algorithms with linear function approximators.<n>The resulting finite-time bound reveals that the asymmetric critic eliminates error terms arising from aliasing in the agent state.
arXiv Detail & Related papers (2025-01-31T13:20:05Z)
Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts.<n>We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z)
Uncertainty-Aware Instance Reweighting for Off-Policy Learning [63.31923483172859]
We propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning. Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator.
arXiv Detail & Related papers (2023-03-11T11:42:26Z)
Unbiased Asymmetric Actor-Critic for Partially Observable Reinforcement Learning [17.48572546628464]
Asymmetric actor-critic methods exploit such information by training a history-based policy via a state-based critic. We examine the theory of asymmetric actor-critic methods which use state-based critics, and expose fundamental issues which undermine the validity of a common variant. We propose an unbiased asymmetric actor-critic variant which is able to exploit state information while remaining theoretically sound.
arXiv Detail & Related papers (2021-05-25T05:18:44Z)
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z)
Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment. One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting. But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient. We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)
Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.