Related papers: Meta-Cognitive Reinforcement Learning with Self-Doubt and Recovery

Meta-Cognitive Reinforcement Learning with Self-Doubt and Recovery

URL: http://arxiv.org/abs/2601.20193v1
Date: Wed, 28 Jan 2026 02:43:03 GMT
Title: Meta-Cognitive Reinforcement Learning with Self-Doubt and Recovery
Authors: Zhipeng Zhang, Wenting Ma, Kai Li, Meng Guo, Lei Yang, Wei Yu, Hongji Cui, Yichen Zhang, Mo Zhang, Jinzhe Lin, Zhenjie Yao,
Abstract summary: We propose a meta-cognitive reinforcement learning framework that enables an agent to assess, regulate, and recover its learning behavior.<n>The proposed method introduces a meta-trust variable driven by Value Prediction Error Stability (VPES), which modulates learning dynamics via fail-safe regulation and gradual trust recovery.
Score: 25.522943543082363
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robust reinforcement learning methods typically focus on suppressing unreliable experiences or corrupted rewards, but they lack the ability to reason about the reliability of their own learning process. As a result, such methods often either overreact to noise by becoming overly conservative or fail catastrophically when uncertainty accumulates. In this work, we propose a meta-cognitive reinforcement learning framework that enables an agent to assess, regulate, and recover its learning behavior based on internally estimated reliability signals. The proposed method introduces a meta-trust variable driven by Value Prediction Error Stability (VPES), which modulates learning dynamics via fail-safe regulation and gradual trust recovery. Experiments on continuous-control benchmarks with reward corruption demonstrate that recovery-enabled meta-cognitive control achieves higher average returns and significantly reduces late-stage training failures compared to strong robustness baselines.

Related papers

Learning Robust Reasoning through Guided Adversarial Self-Play [32.87933476043378]
We introduce GASP (Guided Adrial Self-Play), a robustification method that explicitly trains detect-and-repair capabilities.<n>Without human labels or external teachers, GASP forms an adversarial self-play game within a single model.<n>In-distribution repair guidance, an imitation term on self-generated repairs, increases recovery probability while preserving previously acquired capabilities.
arXiv Detail & Related papers (2026-01-30T02:23:31Z)
Learning to Trust Experience: A Monitor-Trust-Regulator Framework for Learning under Unobservable Feedback Reliability [24.97566911521709]
We study Epistemic Identifiability under Unobservable Reliability (EIUR)<n>Standard robust learning can converge stably yet form high-confidence, systematically wrong beliefs.<n>We propose metacognitive regulation as a practical response: a second, introspective control loop that infers experience credibility from endogenous evidence in the learner's internal dynamics.
arXiv Detail & Related papers (2026-01-14T07:52:14Z)
Parent-Guided Adaptive Reliability (PGAR): A Behavioural Meta-Learning Framework for Stable and Trustworthy AI [0.0]
Parent-Guided Adaptive Reliability (PGAR) is a lightweight behavioural meta-learning framework.<n>It adds a supervisory "parent" layer on top of a standard learner to improve stability, calibration, and recovery under disturbances.<n>PGAR functions as a plug-in reliability layer for existing optimization and learning pipelines, supporting interpretable traces in safety-relevant settings.
arXiv Detail & Related papers (2026-01-07T06:02:34Z)
Aurora: Are Android Malware Classifiers Reliable and Stable under Distribution Shift? [51.12297424766236]
AURORA is a framework to evaluate malware classifiers based on their confidence quality and operational resilience.<n>AURORA is complemented by a set of metrics designed to go beyond point-in-time performance.<n>The fragility in SOTA frameworks across datasets of varying drift suggests the need for a return to the whiteboard.
arXiv Detail & Related papers (2025-05-28T20:22:43Z)
CARIL: Confidence-Aware Regression in Imitation Learning for Autonomous Driving [0.0]
End-to-end vision-based imitation learning has demonstrated promising results in autonomous driving.<n>Traditional approaches rely on either regressionbased models, which provide precise control but lack confidence estimation, or classification-based models, which offer confidence scores but suffer from reduced precision due to discretization.<n>We introduce a dual-head neural network architecture that integrates both regression and classification heads to improve decision reliability in imitation learning.
arXiv Detail & Related papers (2025-03-02T08:19:02Z)
Temporal-Difference Variational Continual Learning [77.92320830700797]
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.<n>Our approach effectively mitigates Catastrophic Forgetting, outperforming strong Variational CL methods.
arXiv Detail & Related papers (2024-10-10T10:58:41Z)
Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance. We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z)
Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement Learning [11.084321518414226]
We adapt existing importance-sampling ratio estimation techniques for off-policy evaluation to drastically improve the stability and efficiency of so-called hindsight policy methods. Our hindsight distribution correction facilitates stable, efficient learning across a broad range of environments where credit assignment plagues baseline methods.
arXiv Detail & Related papers (2023-07-21T20:54:52Z)
Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning. We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z)
Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem. We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z)
An evaluation of word-level confidence estimation for end-to-end automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR) We provide an extensive benchmark of popular confidence methods on four well-known speech datasets. Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.