Reward Design for Justifiable Sequential Decision-Making
- URL: http://arxiv.org/abs/2402.15826v1
- Date: Sat, 24 Feb 2024 14:29:30 GMT
- Title: Reward Design for Justifiable Sequential Decision-Making
- Authors: Aleksa Sukovic, Goran Radanovic
- Abstract summary: We propose the use of a debate-based reward model for reinforcement learning agents.
We show that augmenting the reward with the feedback signal generated by the debate-based reward model yields policies highly favored by the judge.
- Score: 12.284934135116515
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Equipping agents with the capacity to justify made decisions using supporting
evidence represents a cornerstone of accountable decision-making. Furthermore,
ensuring that justifications are in line with human expectations and societal
norms is vital, especially in high-stakes situations such as healthcare. In
this work, we propose the use of a debate-based reward model for reinforcement
learning agents, where the outcome of a zero-sum debate game quantifies the
justifiability of a decision in a particular state. This reward model is then
used to train a justifiable policy, whose decisions can be more easily
corroborated with supporting evidence. In the debate game, two argumentative
agents take turns providing supporting evidence for two competing decisions.
Given the proposed evidence, a proxy of a human judge evaluates which decision
is better justified. We demonstrate the potential of our approach in learning
policies for prescribing and justifying treatment decisions of septic patients.
We show that augmenting the reward with the feedback signal generated by the
debate-based reward model yields policies highly favored by the judge when
compared to the policy obtained solely from the environment rewards, while
hardly sacrificing any performance. Moreover, in terms of the overall
performance and justifiability of trained policies, the debate-based feedback
is comparable to the feedback obtained from an ideal judge proxy that evaluates
decisions using the full information encoded in the state. This suggests that
the debate game outputs key information contained in states that is most
relevant for evaluating decisions, which in turn substantiates the practicality
of combining our approach with human-in-the-loop evaluations. Lastly, we
showcase that agents trained via multi-agent debate learn to propose evidence
that is resilient to refutations and closely aligns with human preferences.
Related papers
- Training Language Models to Win Debates with Self-Play Improves Judge Accuracy [8.13173791334223]
We test the robustness of debate as a method of scalable oversight by training models to debate with data generated via self-play.
We find that language model based evaluators answer questions more accurately when judging models optimized to win debates.
arXiv Detail & Related papers (2024-09-25T05:28:33Z) - Aligning Large Language Models by On-Policy Self-Judgment [49.31895979525054]
Existing approaches for aligning large language models with human preferences face a trade-off that requires a separate reward model (RM) for on-policy learning.
We present a novel alignment framework, SELF-JUDGE, that does on-policy learning and is parameter efficient.
We show that the rejecting sampling by itself can improve performance further without an additional evaluator.
arXiv Detail & Related papers (2024-02-17T11:25:26Z) - Decision Theoretic Foundations for Experiments Evaluating Human Decisions [18.27590643693167]
We argue that to attribute loss in human performance to forms of bias, an experiment must provide participants with the information that a rational agent would need to identify the utility-maximizing decision.
As a demonstration, we evaluate the extent to which recent evaluations of decision-making from the literature on AI-assisted decisions achieve these criteria.
arXiv Detail & Related papers (2024-01-25T16:21:37Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z) - Provable Benefits of Policy Learning from Human Preferences in
Contextual Bandit Problems [82.92678837778358]
preference-based methods have demonstrated substantial success in empirical applications such as InstructGPT.
We show how human bias and uncertainty in feedback modelings can affect the theoretical guarantees of these approaches.
arXiv Detail & Related papers (2023-07-24T17:50:24Z) - Causal Fairness for Outcome Control [68.12191782657437]
We study a specific decision-making task called outcome control in which an automated system aims to optimize an outcome variable $Y$ while being fair and equitable.
In this paper, we first analyze through causal lenses the notion of benefit, which captures how much a specific individual would benefit from a positive decision.
We then note that the benefit itself may be influenced by the protected attribute, and propose causal tools which can be used to analyze this.
arXiv Detail & Related papers (2023-06-08T09:31:18Z) - Inverse Online Learning: Understanding Non-Stationary and Reactionary
Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions.
By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z) - Causal policy ranking [3.7819322027528113]
Given a trained policy, we propose a black-box method based on counterfactual reasoning that estimates the causal effect that these decisions have on reward attainment.
In this work, we compare our measure against an alternative, non-causal, ranking procedure, and discuss potential future work integrating causal algorithms into the interpretation of RL agent policies.
arXiv Detail & Related papers (2021-11-16T12:33:36Z) - A Large Scale Randomized Controlled Trial on Herding in Peer-Review
Discussions [33.261698377782075]
We aim to understand whether reviewers and more senior decision makers get disproportionately influenced by the first argument presented in a discussion.
Specifically, we design and execute a randomized controlled trial with the goal of testing for the conditional causal effect of the discussion initiator's opinion on the outcome of a paper.
arXiv Detail & Related papers (2020-11-30T18:23:07Z) - Inverse Active Sensing: Modeling and Understanding Timely
Decision-Making [111.07204912245841]
We develop a framework for the general setting of evidence-based decision-making under endogenous, context-dependent time pressure.
We demonstrate how it enables modeling intuitive notions of surprise, suspense, and optimality in decision strategies.
arXiv Detail & Related papers (2020-06-25T02:30:45Z) - Explaining reputation assessments [6.87724532311602]
We propose an approach to explain the rationale behind assessments from quantitative reputation models.
Our approach adapts, extends and combines existing approaches for explaining decisions made using multi-attribute decision models.
arXiv Detail & Related papers (2020-06-15T23:19:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.