Modeling Transformative AI Risks (MTAIR) Project -- Summary Report
- URL: http://arxiv.org/abs/2206.09360v1
- Date: Sun, 19 Jun 2022 09:11:23 GMT
- Title: Modeling Transformative AI Risks (MTAIR) Project -- Summary Report
- Authors: Sam Clarke, Ben Cottier, Aryeh Englander, Daniel Eth, David Manheim,
Samuel Dylan Martin, Issa Rice
- Abstract summary: This report builds on an earlier diagram by Cottier and Shah which laid out some of the crucial disagreements ("cruxes") visually, with some explanation.
The model starts with a discussion of reasoning via analogies and general prior beliefs about artificial intelligence.
It lays out a model of different paths and enabling technologies for high-level machine intelligence, and a model of how advances in the capabilities of these systems might proceed.
The model also looks specifically at the question of learned optimization, and whether machine learning systems will create mesa-optimizers.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This report outlines work by the Modeling Transformative AI Risk (MTAIR)
project, an attempt to map out the key hypotheses, uncertainties, and
disagreements in debates about catastrophic risks from advanced AI, and the
relationships between them. This builds on an earlier diagram by Ben Cottier
and Rohin Shah which laid out some of the crucial disagreements ("cruxes")
visually, with some explanation. Based on an extensive literature review and
engagement with experts, the report explains a model of the issues involved,
and the initial software-based implementation that can incorporate probability
estimates or other quantitative factors to enable exploration, planning, and/or
decision support. By gathering information from various debates and discussions
into a single more coherent presentation, we hope to enable better discussions
and debates about the issues involved.
The model starts with a discussion of reasoning via analogies and general
prior beliefs about artificial intelligence. Following this, it lays out a
model of different paths and enabling technologies for high-level machine
intelligence, and a model of how advances in the capabilities of these systems
might proceed, including debates about self-improvement, discontinuous
improvements, and the possibility of distributed, non-agentic high-level
intelligence or slower improvements. The model also looks specifically at the
question of learned optimization, and whether machine learning systems will
create mesa-optimizers. The impact of different safety research on the previous
sets of questions is then examined, to understand whether and how research
could be useful in enabling safer systems. Finally, we discuss a model of
different failure modes and loss of control or takeover scenarios.
Related papers
- Unified Explanations in Machine Learning Models: A Perturbation Approach [0.0]
Inconsistencies between XAI and modeling techniques can have the undesirable effect of casting doubt upon the efficacy of these explainability approaches.
We propose a systematic, perturbation-based analysis against a popular, model-agnostic method in XAI, SHapley Additive exPlanations (Shap)
We devise algorithms to generate relative feature importance in settings of dynamic inference amongst a suite of popular machine learning and deep learning methods, and metrics that allow us to quantify how well explanations generated under the static case hold.
arXiv Detail & Related papers (2024-05-30T16:04:35Z) - The Transformation Risk-Benefit Model of Artificial Intelligence: Balancing Risks and Benefits Through Practical Solutions and Use Cases [0.0]
The authors propose a new framework called "The Transformation Risk-Benefit Model of Artificial Intelligence"
Using the model characteristics, the article emphasizes practical and innovative solutions where benefits outweigh risks.
arXiv Detail & Related papers (2024-04-11T19:19:57Z) - ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life
Videos [53.92440577914417]
ACQUIRED consists of 3.9K annotated videos, encompassing a wide range of event types and incorporating both first and third-person viewpoints.
Each video is annotated with questions that span three distinct dimensions of reasoning, including physical, social, and temporal.
We benchmark our dataset against several state-of-the-art language-only and multimodal models and experimental results demonstrate a significant performance gap.
arXiv Detail & Related papers (2023-11-02T22:17:03Z) - Predictable Artificial Intelligence [77.1127726638209]
This paper introduces the ideas and challenges of Predictable AI.
It explores the ways in which we can anticipate key validity indicators of present and future AI ecosystems.
We argue that achieving predictability is crucial for fostering trust, liability, control, alignment and safety of AI ecosystems.
arXiv Detail & Related papers (2023-10-09T21:36:21Z) - Multi-Agent Verification and Control with Probabilistic Model Checking [4.56877715768796]
Probabilistic model checking is a technique for formal automated reasoning about software or hardware systems.
It builds upon ideas and techniques from a diverse range of fields, from logic, automata and graph theory, to optimisation, numerical methods and control.
In recent years, probabilistic model checking has also been extended to integrate ideas from game theory.
arXiv Detail & Related papers (2023-08-05T09:31:32Z) - Designing explainable artificial intelligence with active inference: A
framework for transparent introspection and decision-making [0.0]
We discuss how active inference can be leveraged to design explainable AI systems.
We propose an architecture for explainable AI systems using active inference.
arXiv Detail & Related papers (2023-06-06T21:38:09Z) - Arguments about Highly Reliable Agent Designs as a Useful Path to
Artificial Intelligence Safety [0.0]
Highly Reliable Agent Designs (HRAD) is one of the most controversial and ambitious approaches.
We have titled the arguments (1) incidental utility, (2) deconfusion, (3) precise specification, and (4) prediction.
We have explained the assumptions and claims based on a review of published and informal literature, along with experts who have stated positions on the topic.
arXiv Detail & Related papers (2022-01-09T07:42:37Z) - KAT: A Knowledge Augmented Transformer for Vision-and-Language [56.716531169609915]
We propose a novel model - Knowledge Augmented Transformer (KAT) - which achieves a strong state-of-the-art result on the open-domain multimodal task of OK-VQA.
Our approach integrates implicit and explicit knowledge in an end to end encoder-decoder architecture, while still jointly reasoning over both knowledge sources during answer generation.
An additional benefit of explicit knowledge integration is seen in improved interpretability of model predictions in our analysis.
arXiv Detail & Related papers (2021-12-16T04:37:10Z) - CausalCity: Complex Simulations with Agency for Causal Discovery and
Reasoning [68.74447489372037]
We present a high-fidelity simulation environment that is designed for developing algorithms for causal discovery and counterfactual reasoning.
A core component of our work is to introduce textitagency, such that it is simple to define and create complex scenarios.
We perform experiments with three state-of-the-art methods to create baselines and highlight the affordances of this environment.
arXiv Detail & Related papers (2021-06-25T00:21:41Z) - Counterfactual Explanations as Interventions in Latent Space [62.997667081978825]
Counterfactual explanations aim to provide to end users a set of features that need to be changed in order to achieve a desired outcome.
Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations.
We present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations.
arXiv Detail & Related papers (2021-06-14T20:48:48Z) - Forethought and Hindsight in Credit Assignment [62.05690959741223]
We work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models.
We investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)-evaluated.
arXiv Detail & Related papers (2020-10-26T16:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.