Avoiding Negative Side Effects due to Incomplete Knowledge of AI Systems
- URL: http://arxiv.org/abs/2008.12146v3
- Date: Mon, 18 Oct 2021 18:40:37 GMT
- Title: Avoiding Negative Side Effects due to Incomplete Knowledge of AI Systems
- Authors: Sandhya Saisubramanian and Shlomo Zilberstein and Ece Kamar
- Abstract summary: Learning to recognize and avoid negative side effects of an agent's actions is critical to improve the safety and reliability of autonomous systems.
Mitigating negative side effects is an emerging research topic that is attracting increased attention due to the rapid growth in the deployment of AI systems.
This article provides a comprehensive overview of different forms of negative side effects and the recent research efforts to address them.
- Score: 35.763408055286355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous agents acting in the real-world often operate based on models that
ignore certain aspects of the environment. The incompleteness of any given
model -- handcrafted or machine acquired -- is inevitable due to practical
limitations of any modeling technique for complex real-world settings. Due to
the limited fidelity of its model, an agent's actions may have unexpected,
undesirable consequences during execution. Learning to recognize and avoid such
negative side effects of an agent's actions is critical to improve the safety
and reliability of autonomous systems. Mitigating negative side effects is an
emerging research topic that is attracting increased attention due to the rapid
growth in the deployment of AI systems and their broad societal impacts. This
article provides a comprehensive overview of different forms of negative side
effects and the recent research efforts to address them. We identify key
characteristics of negative side effects, highlight the challenges in avoiding
negative side effects, and discuss recently developed approaches, contrasting
their benefits and limitations. The article concludes with a discussion of open
questions and suggestions for future research directions.
Related papers
- Generative Intervention Models for Causal Perturbation Modeling [80.72074987374141]
In many applications, it is a priori unknown which mechanisms of a system are modified by an external perturbation.
We propose a generative intervention model (GIM) that learns to map these perturbation features to distributions over atomic interventions.
arXiv Detail & Related papers (2024-11-21T10:37:57Z) - Synthetic Image Generation in Cyber Influence Operations: An Emergent Threat? [0.0]
This report explores the potential and limitations of generative deep learning models, such as diffusion models, in fabricating convincing synthetic images.
We critically assess the accessibility, practicality, and output quality of these tools and their implications in threat scenarios of deception, influence, and subversion.
We generate content for several hypothetical cyber influence operations to demonstrate the current capabilities and limitations of these AI-driven methods for threat actors.
arXiv Detail & Related papers (2024-03-18T19:44:30Z) - The Social Impact of Generative AI: An Analysis on ChatGPT [0.7401425472034117]
The rapid development of Generative AI models has sparked heated discussions regarding their benefits, limitations, and associated risks.
Generative models hold immense promise across multiple domains, such as healthcare, finance, and education, to cite a few.
This paper adopts a methodology to delve into the societal implications of Generative AI tools, focusing primarily on the case of ChatGPT.
arXiv Detail & Related papers (2024-03-07T17:14:22Z) - Predictability and Surprise in Large Generative Models [8.055204456718576]
Large-scale pre-training has emerged as a technique for creating capable, general purpose, generative models.
In this paper, we highlight a counterintuitive property of such models and discuss the policy implications of this property.
arXiv Detail & Related papers (2022-02-15T23:21:23Z) - Counterfactual Explanations as Interventions in Latent Space [62.997667081978825]
Counterfactual explanations aim to provide to end users a set of features that need to be changed in order to achieve a desired outcome.
Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations.
We present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations.
arXiv Detail & Related papers (2021-06-14T20:48:48Z) - Challenges for Using Impact Regularizers to Avoid Negative Side Effects [74.67972013102462]
We discuss the main current challenges of impact regularizers and relate them to fundamental design decisions.
We explore promising directions to overcome the unsolved challenges in preventing negative side effects with impact regularizers.
arXiv Detail & Related papers (2021-01-29T10:32:51Z) - Overcoming Failures of Imagination in AI Infused System Development and
Deployment [71.9309995623067]
NeurIPS 2020 requested that research paper submissions include impact statements on "potential nefarious uses and the consequences of failure"
We argue that frameworks of harms must be context-aware and consider a wider range of potential stakeholders, system affordances, as well as viable proxies for assessing harms in the widest sense.
arXiv Detail & Related papers (2020-11-26T18:09:52Z) - Adversarial vs behavioural-based defensive AI with joint, continual and
active learning: automated evaluation of robustness to deception, poisoning
and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security.
In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.