The Impact of Task Underspecification in Evaluating Deep Reinforcement
Learning
- URL: http://arxiv.org/abs/2210.08607v1
- Date: Sun, 16 Oct 2022 18:51:55 GMT
- Title: The Impact of Task Underspecification in Evaluating Deep Reinforcement
Learning
- Authors: Vindula Jayawardana, Catherine Tang, Sirui Li, Dajiang Suo, Cathy Wu
- Abstract summary: Evaluations of Deep Reinforcement Learning (DRL) methods are an integral part of scientific progress of the field.
In this article, we augment DRL evaluations to consider parameterized families of MDPs.
We show that evaluating the MDP family often yields a substantially different relative ranking of methods, casting doubt on what methods should be considered state-of-the-art.
- Score: 1.4711121887106535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evaluations of Deep Reinforcement Learning (DRL) methods are an integral part
of scientific progress of the field. Beyond designing DRL methods for general
intelligence, designing task-specific methods is becoming increasingly
prominent for real-world applications. In these settings, the standard
evaluation practice involves using a few instances of Markov Decision Processes
(MDPs) to represent the task. However, many tasks induce a large family of MDPs
owing to variations in the underlying environment, particularly in real-world
contexts. For example, in traffic signal control, variations may stem from
intersection geometries and traffic flow levels. The select MDP instances may
thus inadvertently cause overfitting, lacking the statistical power to draw
conclusions about the method's true performance across the family. In this
article, we augment DRL evaluations to consider parameterized families of MDPs.
We show that in comparison to evaluating DRL methods on select MDP instances,
evaluating the MDP family often yields a substantially different relative
ranking of methods, casting doubt on what methods should be considered
state-of-the-art. We validate this phenomenon in standard control benchmarks
and the real-world application of traffic signal control. At the same time, we
show that accurately evaluating on an MDP family is nontrivial. Overall, this
work identifies new challenges for empirical rigor in reinforcement learning,
especially as the outcomes of DRL trickle into downstream decision-making.
Related papers
- Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration [13.053013407015628]
This paper addresses the problem of learning optimal control policies for systems with uncertain dynamics.
We propose an accelerated RL algorithm that can learn control policies significantly faster than competitive approaches.
arXiv Detail & Related papers (2024-10-16T00:53:41Z) - Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination [7.162274565861427]
offline reinforcement learning in dynamic treatment regimes presents a mix of unprecedented opportunities and challenges.
We argue for a reassessment of applying RL in dynamic treatment regimes citing concerns such as inconsistent and potentially inconclusive evaluation metrics.
We demonstrate that the performance of RL algorithms can significantly vary with changes in evaluation metrics and Markov Decision Process (MDP) formulations.
arXiv Detail & Related papers (2024-05-28T20:03:18Z) - Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients.
Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty.
In this work, we define a robust MRL objective with a controlled level.
The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Semi-Markov Offline Reinforcement Learning for Healthcare [57.15307499843254]
We introduce three offline RL algorithms, namely, SDQN, SDDQN, and SBCQ.
We experimentally demonstrate that only these algorithms learn the optimal policy in variable-time environments.
We apply our new algorithms to a real-world offline dataset pertaining to warfarin dosing for stroke prevention.
arXiv Detail & Related papers (2022-03-17T14:51:21Z) - A Validation Tool for Designing Reinforcement Learning Environments [0.0]
This study proposes a Markov-based feature analysis method to validate whether an MDP is well formulated.
We believe an MDP suitable for applying RL should contain a set of state features that are both sensitive to actions and predictive in rewards.
arXiv Detail & Related papers (2021-12-10T13:28:08Z) - Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors.
Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP)
We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z) - Policy Information Capacity: Information-Theoretic Measure for Task
Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty.
We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives.
These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z) - DeepAveragers: Offline Reinforcement Learning by Solving Derived
Non-Parametric MDPs [47.73837217824527]
We study an approach to offline reinforcement learning (RL) based on optimally solving finitely-represented MDPs derived from a static dataset of experience.
Our main contribution is to introduce the Deep Averagers with Costs MDP (DAC-MDP) and to investigate its solutions for offline RL.
arXiv Detail & Related papers (2020-10-18T00:11:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.