HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare
- URL: http://arxiv.org/abs/2302.09212v1
- Date: Sat, 18 Feb 2023 02:33:30 GMT
- Title: HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare
- Authors: Ge Gao, Song Ju, Markel Sanz Ausin, Min Chi
- Abstract summary: Off-policy evaluation is crucial for inducing effective policies in human-centric environments.
We propose a human-centric OPE to handle partial observability and aggregated rewards.
Our approach reliably predicts the returns of different policies and outperforms state-of-the-art benchmarks.
- Score: 15.57203496240758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) has been extensively researched for enhancing
human-environment interactions in various human-centric tasks, including
e-learning and healthcare. Since deploying and evaluating policies online are
high-stakes in such tasks, off-policy evaluation (OPE) is crucial for inducing
effective policies. In human-centric environments, however, OPE is challenging
because the underlying state is often unobservable, while only aggregate
rewards can be observed (students' test scores or whether a patient is released
from the hospital eventually). In this work, we propose a human-centric OPE
(HOPE) to handle partial observability and aggregated rewards in such
environments. Specifically, we reconstruct immediate rewards from the
aggregated rewards considering partial observability to estimate expected total
returns. We provide a theoretical bound for the proposed method, and we have
conducted extensive experiments in real-world human-centric tasks, including
sepsis treatments and an intelligent tutoring system. Our approach reliably
predicts the returns of different policies and outperforms state-of-the-art
benchmarks using both standard validation methods and human-centric
significance tests.
Related papers
- MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention [81.56607128684723]
We introduce MEReQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning), designed for sample-efficient alignment from human intervention.
MereQ infers a residual reward function that captures the discrepancy between the human expert's and the prior policy's underlying reward functions.
It then employs Residual Q-Learning (RQL) to align the policy with human preferences using this residual reward function.
arXiv Detail & Related papers (2024-06-24T01:51:09Z) - ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models [53.00812898384698]
We argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking.
We highlight how cognitive biases can conflate fluent information and truthfulness, and how cognitive uncertainty affects the reliability of rating scores such as Likert.
We propose the ConSiDERS-The-Human evaluation framework consisting of 6 pillars -- Consistency, Scoring Criteria, Differentiating, User Experience, Responsible, and Scalability.
arXiv Detail & Related papers (2024-05-28T22:45:28Z) - A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models [20.11590976578911]
Large language models (LLMs) hold promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities.
Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity.
We present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions.
arXiv Detail & Related papers (2024-03-18T17:56:37Z) - Sample Complexity of Preference-Based Nonparametric Off-Policy
Evaluation with Deep Networks [58.469818546042696]
We study the sample efficiency of OPE with human preference and establish a statistical guarantee for it.
By appropriately selecting the size of a ReLU network, we show that one can leverage any low-dimensional manifold structure in the Markov decision process.
arXiv Detail & Related papers (2023-10-16T16:27:06Z) - Off-Policy Evaluation for Human Feedback [46.82894469763776]
Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL)
Existing OPE methods fall short in estimating human feedback (HF) signals.
We introduce an OPE for HF (OPEHF) framework that revives existing OPE methods in order to accurately evaluate the HF signals.
arXiv Detail & Related papers (2023-10-11T01:52:42Z) - Revisiting the Gold Standard: Grounding Summarization Evaluation with
Robust Human Evaluation [136.16507050034755]
Existing human evaluation studies for summarization either exhibit a low inter-annotator agreement or have insufficient scale.
We propose a modified summarization salience protocol, Atomic Content Units (ACUs), which is based on fine-grained semantic units.
We curate the Robust Summarization Evaluation (RoSE) benchmark, a large human evaluation dataset consisting of 22,000 summary-level annotations over 28 top-performing systems.
arXiv Detail & Related papers (2022-12-15T17:26:05Z) - Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy
Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning.
ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation.
Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z) - Optimizing Medical Treatment for Sepsis in Intensive Care: from
Reinforcement Learning to Pre-Trial Evaluation [2.908482270923597]
Our aim is to establish a framework where reinforcement learning (RL) of optimizing interventions retrospectively allows us a regulatory compliant pathway to prospective clinical testing of the learned policies.
We focus on infections in intensive care units which are one of the major causes of death and difficult to treat because of the complex and opaque patient dynamics.
arXiv Detail & Related papers (2020-03-13T20:31:47Z) - Interpretable Off-Policy Evaluation in Reinforcement Learning by
Highlighting Influential Transitions [48.91284724066349]
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education.
Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding.
We develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates.
arXiv Detail & Related papers (2020-02-10T00:26:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.