Detecting Rewards Deterioration in Episodic Reinforcement Learning
- URL: http://arxiv.org/abs/2010.11660v3
- Date: Thu, 28 Oct 2021 20:20:21 GMT
- Title: Detecting Rewards Deterioration in Episodic Reinforcement Learning
- Authors: Ido Greenberg, Shie Mannor
- Abstract summary: In many RL applications, once training ends, it is vital to detect any deterioration in the agent performance as soon as possible.
We consider an episodic framework, where the rewards within each episode are not independent, nor identically-distributed, nor Markov.
We define the mean-shift in a way corresponding to deterioration of a temporal signal (such as the rewards), and derive a test for this problem with optimal statistical power.
- Score: 63.49923393311052
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many RL applications, once training ends, it is vital to detect any
deterioration in the agent performance as soon as possible. Furthermore, it
often has to be done without modifying the policy and under minimal assumptions
regarding the environment. In this paper, we address this problem by focusing
directly on the rewards and testing for degradation. We consider an episodic
framework, where the rewards within each episode are not independent, nor
identically-distributed, nor Markov. We present this problem as a multivariate
mean-shift detection problem with possibly partial observations. We define the
mean-shift in a way corresponding to deterioration of a temporal signal (such
as the rewards), and derive a test for this problem with optimal statistical
power. Empirically, on deteriorated rewards in control problems (generated
using various environment modifications), the test is demonstrated to be more
powerful than standard tests - often by orders of magnitude. We also suggest a
novel Bootstrap mechanism for False Alarm Rate control (BFAR), applicable to
episodic (non-i.i.d) signal and allowing our test to run sequentially in an
online manner. Our method does not rely on a learned model of the environment,
is entirely external to the agent, and in fact can be applied to detect changes
or drifts in any episodic signal.
Related papers
- Typicalness-Aware Learning for Failure Detection [26.23185979968123]
Deep neural networks (DNNs) often suffer from the overconfidence issue, where incorrect predictions are made with high confidence scores.
We propose a novel approach called Typicalness-Aware Learning (TAL) to address this issue and improve failure detection performance.
arXiv Detail & Related papers (2024-11-04T11:09:47Z) - Invariant Anomaly Detection under Distribution Shifts: A Causal
Perspective [6.845698872290768]
Anomaly detection (AD) is the machine learning task of identifying highly discrepant abnormal samples.
Under the constraints of a distribution shift, the assumption that training samples and test samples are drawn from the same distribution breaks down.
We attempt to increase the resilience of anomaly detection models to different kinds of distribution shifts.
arXiv Detail & Related papers (2023-12-21T23:20:47Z) - DARTH: Holistic Test-time Adaptation for Multiple Object Tracking [87.72019733473562]
Multiple object tracking (MOT) is a fundamental component of perception systems for autonomous driving.
Despite the urge of safety in driving systems, no solution to the MOT adaptation problem to domain shift in test-time conditions has ever been proposed.
We introduce DARTH, a holistic test-time adaptation framework for MOT.
arXiv Detail & Related papers (2023-10-03T10:10:42Z) - CARLA: Self-supervised Contrastive Representation Learning for Time Series Anomaly Detection [53.83593870825628]
One main challenge in time series anomaly detection (TSAD) is the lack of labelled data in many real-life scenarios.
Most of the existing anomaly detection methods focus on learning the normal behaviour of unlabelled time series in an unsupervised manner.
We introduce a novel end-to-end self-supervised ContrAstive Representation Learning approach for time series anomaly detection.
arXiv Detail & Related papers (2023-08-18T04:45:56Z) - Continual Test-Time Domain Adaptation [94.51284735268597]
Test-time domain adaptation aims to adapt a source pre-trained model to a target domain without using any source data.
CoTTA is easy to implement and can be readily incorporated in off-the-shelf pre-trained models.
arXiv Detail & Related papers (2022-03-25T11:42:02Z) - Tracking the risk of a deployed model and detecting harmful distribution
shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially.
We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z) - Accelerated Policy Evaluation: Learning Adversarial Environments with
Adaptive Importance Sampling [19.81658135871748]
A biased or inaccurate policy evaluation in a safety-critical system could potentially cause unexpected catastrophic failures.
We propose the Accelerated Policy Evaluation (APE) method, which simultaneously uncovers rare events and estimates the rare event probability.
APE is scalable to large discrete or continuous spaces by incorporating function approximators.
arXiv Detail & Related papers (2021-06-19T20:03:26Z) - DisCor: Corrective Feedback in Reinforcement Learning via Distribution
Correction [96.90215318875859]
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from corrective feedback.
We propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training.
arXiv Detail & Related papers (2020-03-16T16:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.