First do no harm: counterfactual objective functions for safe & ethical
AI
- URL: http://arxiv.org/abs/2204.12993v1
- Date: Wed, 27 Apr 2022 15:03:43 GMT
- Title: First do no harm: counterfactual objective functions for safe & ethical
AI
- Authors: Jonathan G. Richens, Rory Beard, Daniel H. Thompson
- Abstract summary: We develop the first statistical definition of harm and a framework for factoring harm into algorithmic decisions.
Our results show that counterfactual reasoning is a key ingredient for safe and ethical AI.
- Score: 0.03683202928838612
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To act safely and ethically in the real world, agents must be able to reason
about harm and avoid harmful actions. In this paper we develop the first
statistical definition of harm and a framework for factoring harm into
algorithmic decisions. We argue that harm is fundamentally a counterfactual
quantity, and show that standard machine learning algorithms are guaranteed to
pursue harmful policies in certain environments. To resolve this, we derive a
family of counterfactual objective functions that robustly mitigate for harm.
We demonstrate our approach with a statistical model for identifying optimal
drug doses. While identifying optimal doses using the causal treatment effect
results in harmful treatment decisions, our counterfactual algorithm identifies
doses that are far less harmful without sacrificing efficacy. Our results show
that counterfactual reasoning is a key ingredient for safe and ethical AI.
Related papers
- Criticality and Safety Margins for Reinforcement Learning [53.10194953873209]
We seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users.
We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions.
We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality.
arXiv Detail & Related papers (2024-09-26T21:00:45Z) - Speculations on Uncertainty and Humane Algorithms [0.0]
Provenance enables algorithms to know what they know preventing possible harms.
It is essential to compute with what we know rather than make assumptions that may be unjustified or untenable.
arXiv Detail & Related papers (2024-08-13T08:54:34Z) - Can a Bayesian Oracle Prevent Harm from an Agent? [48.12936383352277]
We consider estimating a context-dependent bound on the probability of violating a given safety specification.
Noting that different plausible hypotheses about the world could produce very different outcomes, we derive on the safety violation probability predicted under the true but unknown hypothesis.
We consider two forms of this result, in the iid case and in the non-iid case, and conclude with open problems towards turning such results into practical AI guardrails.
arXiv Detail & Related papers (2024-08-09T18:10:42Z) - Inception: Efficiently Computable Misinformation Attacks on Markov Games [14.491458698581038]
We study security threats to Markov games due to information asymmetry and misinformation.
We derive the victim's policy under worst-case rationality and present-time algorithms to compute the attacker's optimal worst-case policy.
Our work exposes a security vulnerability from standard game assumptions under misinformation.
arXiv Detail & Related papers (2024-06-24T20:01:43Z) - Control Risk for Potential Misuse of Artificial Intelligence in Science [85.91232985405554]
We aim to raise awareness of the dangers of AI misuse in science.
We highlight real-world examples of misuse in chemical science.
We propose a system called SciGuard to control misuse risks for AI models in science.
arXiv Detail & Related papers (2023-12-11T18:50:57Z) - Protecting Society from AI Misuse: When are Restrictions on Capabilities
Warranted? [0.0]
We argue that targeted interventions on certain capabilities will be warranted to prevent some misuses of AI.
These restrictions may include controlling who can access certain types of AI models, what they can be used for, whether outputs are filtered or can be traced back to their user.
We apply this reasoning to three examples: predicting novel toxins, creating harmful images, and automating spear phishing campaigns.
arXiv Detail & Related papers (2023-03-16T15:05:59Z) - A Quantitative Account of Harm [18.7822411439221]
We first present a quantitative definition of harm in a deterministic context involving a single individual.
We then consider the issues involved in dealing with uncertainty regarding the context.
We show that the "obvious" way of doing this can lead to counterintuitive or inappropriate answers.
arXiv Detail & Related papers (2022-09-29T21:48:38Z) - The Hammer and the Nut: Is Bilevel Optimization Really Needed to Poison
Linear Classifiers? [27.701693158702753]
Data poisoning is a particularly worrisome subset of poisoning attacks.
We propose a counter-intuitive but efficient framework to combat data poisoning.
Our framework achieves comparable, or even better, performances in terms of the attacker's objective.
arXiv Detail & Related papers (2021-03-23T09:08:10Z) - Offline Contextual Bandits with Overparameterized Models [52.788628474552276]
We ask whether the same phenomenon occurs for offline contextual bandits.
We show that this discrepancy is due to the emphaction-stability of their objectives.
In experiments with large neural networks, this gap between action-stable value-based objectives and unstable policy-based objectives leads to significant performance differences.
arXiv Detail & Related papers (2020-06-27T13:52:07Z) - A Deep Q-learning/genetic Algorithms Based Novel Methodology For
Optimizing Covid-19 Pandemic Government Actions [63.669642197519934]
We use the SEIR epidemiological model to represent the evolution of the virus COVID-19 over time in the population.
The sequences of actions (confinement, self-isolation, two-meter distance or not taking restrictions) are evaluated according to a reward system.
We prove that our methodology is a valid tool to discover actions governments can take to reduce the negative effects of a pandemic in both senses.
arXiv Detail & Related papers (2020-05-15T17:17:45Z) - A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous
Algorithmic Scores [85.12096045419686]
We study the adoption of an algorithmic tool used to assist child maltreatment hotline screening decisions.
We first show that humans do alter their behavior when the tool is deployed.
We show that humans are less likely to adhere to the machine's recommendation when the score displayed is an incorrect estimate of risk.
arXiv Detail & Related papers (2020-02-19T07:27:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.