Related papers: UACER: An Uncertainty-Aware Critic Ensemble Framework for Robust Adversarial Reinforcement Learning

UACER: An Uncertainty-Aware Critic Ensemble Framework for Robust Adversarial Reinforcement Learning

URL: http://arxiv.org/abs/2512.10492v1
Date: Thu, 11 Dec 2025 10:14:13 GMT
Title: UACER: An Uncertainty-Aware Critic Ensemble Framework for Robust Adversarial Reinforcement Learning
Authors: Jiaxi Wu, Tiantian Zhang, Yuxing Wang, Yongzhe Chang, Xueqian Wang,
Abstract summary: We propose a novel approach, Uncertainty-Aware Critic Ensemble for robust adversarial Reinforcement learning (UACER)<n>In this paper, we propose a novel approach, Uncertainty-Aware Critic Ensemble for robust adversarial Reinforcement learning (UACER)
Score: 15.028168889991795
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robust adversarial reinforcement learning has emerged as an effective paradigm for training agents to handle uncertain disturbance in real environments, with critical applications in sequential decision-making domains such as autonomous driving and robotic control. Within this paradigm, agent training is typically formulated as a zero-sum Markov game between a protagonist and an adversary to enhance policy robustness. However, the trainable nature of the adversary inevitably induces non-stationarity in the learning dynamics, leading to exacerbated training instability and convergence difficulties, particularly in high-dimensional complex environments. In this paper, we propose a novel approach, Uncertainty-Aware Critic Ensemble for robust adversarial Reinforcement learning (UACER), which consists of two strategies: 1) Diversified critic ensemble: a diverse set of K critic networks is exploited in parallel to stabilize Q-value estimation rather than conventional single-critic architectures for both variance reduction and robustness enhancement. 2) Time-varying Decay Uncertainty (TDU) mechanism: advancing beyond simple linear combinations, we develop a variance-derived Q-value aggregation strategy that explicitly incorporates epistemic uncertainty to dynamically regulate the exploration-exploitation trade-off while simultaneously stabilizing the training process. Comprehensive experiments across several MuJoCo control problems validate the superior effectiveness of UACER, outperforming state-of-the-art methods in terms of overall performance, stability, and efficiency.

Related papers

Continual uncertainty learning [0.0]
This study formulates a new curriculum-based continual learning framework for robust control problems involving nonlinear dynamical systems.<n>As a practical industrial application, this study applies the proposed method to designing an active vibration controller for automotive powertrains.
arXiv Detail & Related papers (2026-02-19T08:39:42Z)
Embodiment-Induced Coordination Regimes in Tabular Multi-Agent Q-Learning [0.0]
We compare independent and centralized Q-learning under explicit embodiment constraints on agent speed and stamina.<n>Results show that increased coordination can become a liability under embodiment constraints.
arXiv Detail & Related papers (2026-01-24T13:09:01Z)
Sparse Threats, Focused Defense: Criticality-Aware Robust Reinforcement Learning for Safe Autonomous Driving [11.62520853262219]
We introduce criticality-aware robust RL (CARRL) for handling sparse, safety-critical risks in autonomous driving.<n>CARRL consists of two interacting components: a risk exposure adversary (REA) and a risk-targeted robust agent (RTRA)<n>We show that our approach reduces the collision rate by at least 22.66% across all cases compared to state-of-the-art baseline methods.
arXiv Detail & Related papers (2026-01-05T05:20:16Z)
Uncertainty-Resilient Multimodal Learning via Consistency-Guided Cross-Modal Transfer [0.0]
This thesis explores uncertainty-resilient multimodal learning through consistency-guided cross-modal transfer.<n>The central idea is to use cross-modal semantic consistency as a basis for robust representation learning.<n>Building on this foundation, the thesis investigates strategies to enhance semantic robustness, improve data efficiency, and reduce the impact of noise and imperfect supervision.
arXiv Detail & Related papers (2025-11-18T15:26:42Z)
Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories [58.988535279557546]
We introduce textbf sycophancy Mitigation through Adaptive Reasoning Trajectories.<n>We show that SMART significantly reduces sycophantic behavior while preserving strong performance on out-of-distribution inputs.
arXiv Detail & Related papers (2025-09-20T17:09:14Z)
Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization [11.320660946946523]
Continuous control of non-stationary environments is a major challenge for deep reinforcement learning algorithms.<n>We show that performing on-policy reinforcement learning with an evidential critic provides both.<n>We name the resulting algorithm emphEvidential Proximal Policy Optimization (EPPO) due to the integral role of evidential uncertainty quantification in both policy evaluation and policy improvement stages.
arXiv Detail & Related papers (2025-03-03T12:23:07Z)
Improving Domain Generalization in Self-supervised Monocular Depth Estimation via Stabilized Adversarial Training [61.35809887986553]
We propose a general adversarial training framework, named Stabilized Conflict-optimization Adversarial Training (SCAT) SCAT integrates adversarial data augmentation into self-supervised MDE methods to achieve a balance between stability and generalization. Experiments on five benchmarks demonstrate that SCAT can achieve state-of-the-art performance and significantly improve the generalization capability of existing self-supervised MDE methods.
arXiv Detail & Related papers (2024-11-04T15:06:57Z)
Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey [8.1138182541639]
Deep Reinforcement Learning (DRL) is a subfield of machine learning for training autonomous agents that take sequential actions across complex environments.<n>It remains susceptible to minor condition variations, raising concerns about its reliability in real-world applications.<n>A way to improve robustness of DRL to unknown changes in the environmental conditions and possible perturbations is through Adversarial Training.
arXiv Detail & Related papers (2024-03-01T10:16:46Z)
Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation [57.351098530477124]
We consider one critical type of robustness against spurious correlation, where different portions of the state do not have correlations induced by unobserved confounders. A model that learns such useless or even harmful correlation could catastrophically fail when the confounder in the test case deviates from the training one. Existing robust algorithms that assume simple and unstructured uncertainty sets are therefore inadequate to address this challenge.
arXiv Detail & Related papers (2023-07-15T23:53:37Z)
Enhancing Adversarial Training with Feature Separability [52.39305978984573]
We introduce a new concept of adversarial training graph (ATG) with which the proposed adversarial training with feature separability (ATFS) enables to boost the intra-class feature similarity and increase inter-class feature variance. Through comprehensive experiments, we demonstrate that the proposed ATFS framework significantly improves both clean and robust performance.
arXiv Detail & Related papers (2022-05-02T04:04:23Z)
Adversarially Robust Stability Certificates can be Sample-Efficient [14.658040519472646]
We consider learning adversarially robust stability certificates for unknown nonlinear dynamical systems. We show that the statistical cost of learning an adversarial stability certificate is equivalent, up to constant factors, to that of learning a nominal stability certificate.
arXiv Detail & Related papers (2021-12-20T17:23:31Z)
Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem. We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z)
Efficient Empowerment Estimation for Unsupervised Stabilization [75.32013242448151]
empowerment principle enables unsupervised stabilization of dynamical systems at upright positions. We propose an alternative solution based on a trainable representation of a dynamical system as a Gaussian channel. We show that our method has a lower sample complexity, is more stable in training, possesses the essential properties of the empowerment function, and allows estimation of empowerment from images.
arXiv Detail & Related papers (2020-07-14T21:10:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.