Improving Human Sequential Decision-Making with Reinforcement Learning
- URL: http://arxiv.org/abs/2108.08454v5
- Date: Tue, 19 Mar 2024 08:12:28 GMT
- Title: Improving Human Sequential Decision-Making with Reinforcement Learning
- Authors: Hamsa Bastani, Osbert Bastani, Wichinpong Park Sinchaisri,
- Abstract summary: We design a novel machine learning algorithm that is capable of extracting "best practices" from trace data.
Our algorithm selects the tip that best bridges the gap between the actions taken by human workers and those taken by the optimal policy.
Experiments show that the tips generated by our algorithm can significantly improve human performance.
- Score: 29.334511328067777
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Workers spend a significant amount of time learning how to make good decisions. Evaluating the efficacy of a given decision, however, can be complicated -- e.g., decision outcomes are often long-term and relate to the original decision in complex ways. Surprisingly, even though learning good decision-making strategies is difficult, they can often be expressed in simple and concise forms. Focusing on sequential decision-making, we design a novel machine learning algorithm that is capable of extracting "best practices" from trace data and conveying its insights to humans in the form of interpretable "tips". Our algorithm selects the tip that best bridges the gap between the actions taken by human workers and those taken by the optimal policy in a way that accounts for which actions are consequential for achieving higher performance. We evaluate our approach through a series of randomized controlled experiments where participants manage a virtual kitchen. Our experiments show that the tips generated by our algorithm can significantly improve human performance relative to intuitive baselines. In addition, we discuss a number of empirical insights that can help inform the design of algorithms intended for human-AI interfaces. For instance, we find evidence that participants do not simply blindly follow our tips; instead, they combine them with their own experience to discover additional strategies for improving performance.
Related papers
- Designing Algorithmic Recommendations to Achieve Human-AI Complementarity [2.4247752614854203]
We formalize the design of recommendation algorithms that assist human decision-makers.
We use a potential-outcomes framework to model the effect of recommendations on a human decision-maker's binary treatment choice.
We derive minimax optimal recommendation algorithms that can be implemented with machine learning.
arXiv Detail & Related papers (2024-05-02T17:15:30Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Optimising Human-AI Collaboration by Learning Convincing Explanations [62.81395661556852]
We propose a method for a collaborative system that remains safe by having a human making decisions.
Ardent enables efficient and effective decision-making by adapting to individual preferences for explanations.
arXiv Detail & Related papers (2023-11-13T16:00:16Z) - Inverse Online Learning: Understanding Non-Stationary and Reactionary
Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions.
By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z) - Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap.
We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert.
Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z) - Consistent Estimators for Learning to Defer to an Expert [5.076419064097734]
We show how to learn predictors that can either predict or choose to defer the decision to a downstream expert.
We show the effectiveness of our approach on a variety of experimental tasks.
arXiv Detail & Related papers (2020-06-02T18:21:38Z) - Automatic Discovery of Interpretable Planning Strategies [9.410583483182657]
We introduce AI-Interpret, a method for transforming idiosyncratic policies into simple and interpretable descriptions.
We show that prividing the decision rules generated by AI-Interpret as flowcharts significantly improved people's planning strategies and decisions.
arXiv Detail & Related papers (2020-05-24T12:24:52Z) - Learning with Differentiable Perturbed Optimizers [54.351317101356614]
We propose a systematic method to transform operations into operations that are differentiable and never locally constant.
Our approach relies on perturbeds, and can be used readily together with existing solvers.
We show how this framework can be connected to a family of losses developed in structured prediction, and give theoretical guarantees for their use in learning tasks.
arXiv Detail & Related papers (2020-02-20T11:11:32Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.