Related papers: Position Paper: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience

Position Paper: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience

URL: http://arxiv.org/abs/2406.01352v1
Date: Mon, 3 Jun 2024 14:16:56 GMT
Title: Position Paper: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience
Authors: Martina G. Vilas, Federico Adolfi, David Poeppel, Gemma Roig,
Abstract summary: Inner Interpretability is a promising field tasked with uncovering the inner mechanisms of AI systems. Recent critiques raise issues that question its usefulness to advance the broader goals of AI. Here we draw the relevant connections and highlight lessons that can be transferred productively between fields.
Score: 4.524832437237367
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Inner Interpretability is a promising emerging field tasked with uncovering the inner mechanisms of AI systems, though how to develop these mechanistic theories is still much debated. Moreover, recent critiques raise issues that question its usefulness to advance the broader goals of AI. However, it has been overlooked that these issues resemble those that have been grappled with in another field: Cognitive Neuroscience. Here we draw the relevant connections and highlight lessons that can be transferred productively between fields. Based on these, we propose a general conceptual framework and give concrete methodological strategies for building mechanistic explanations in AI inner interpretability research. With this conceptual framework, Inner Interpretability can fend off critiques and position itself on a productive path to explain AI systems.

Related papers

Open Problems in Mechanistic Interpretability [61.44773053835185]
Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities. Despite recent progress toward these goals, there are many open problems in the field that require solutions.
arXiv Detail & Related papers (2025-01-27T20:57:18Z)
Imagining and building wise machines: The centrality of AI metacognition [78.76893632793497]
We argue that shortcomings stem from one overarching failure: AI systems lack wisdom. While AI research has focused on task-level strategies, metacognition is underdeveloped in AI systems. We propose that integrating metacognitive capabilities into AI systems is crucial for enhancing their robustness, explainability, cooperation, and safety.
arXiv Detail & Related papers (2024-11-04T18:10:10Z)
A Mechanistic Explanatory Strategy for XAI [0.0]
This paper outlines a mechanistic strategy for explaining the functional organization of deep learning systems. According to the mechanistic approach, the explanation of opaque AI systems involves identifying mechanisms that drive decision-making. This research suggests that a systematic approach to studying model organization can reveal elements that simpler (or ''more modest'') explainability techniques might miss.
arXiv Detail & Related papers (2024-11-02T18:30:32Z)
Metacognitive AI: Framework and the Case for a Neurosymbolic Approach [5.5441283041944]
We introduce a framework for understanding metacognitive artificial intelligence (AI) that we call TRAP: transparency, reasoning, adaptation, and perception. We discuss each of these aspects in-turn and explore how neurosymbolic AI (NSAI) can be leveraged to address challenges of metacognition.
arXiv Detail & Related papers (2024-06-17T23:30:46Z)
Mechanistic Interpretability for AI Safety -- A Review [28.427951836334188]
This review explores mechanistic interpretability. Mechanistic interpretability could help prevent catastrophic outcomes as AI systems become more powerful and inscrutable.
arXiv Detail & Related papers (2024-04-22T11:01:51Z)
Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
We emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions. In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model.
arXiv Detail & Related papers (2024-02-28T16:09:56Z)
Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing [51.524108608250074]
Black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in remote sensing. We perform a systematic review to identify the key trends in the field and shed light on novel explainable AI approaches. We also give a detailed outlook on the challenges and promising research directions.
arXiv Detail & Related papers (2024-02-21T13:19:58Z)
Emergent Explainability: Adding a causal chain to neural network inference [0.0]
This position paper presents a theoretical framework for enhancing explainable artificial intelligence (xAI) through emergent communication (EmCom) We explore the novel integration of EmCom into AI systems, offering a paradigm shift from conventional associative relationships between inputs and outputs to a more nuanced, causal interpretation. The paper discusses the theoretical underpinnings of this approach, its potential broad applications, and its alignment with the growing need for responsible and transparent AI systems.
arXiv Detail & Related papers (2024-01-29T02:28:39Z)
Building Bridges: Generative Artworks to Explore AI Ethics [56.058588908294446]
In recent years, there has been an increased emphasis on understanding and mitigating adverse impacts of artificial intelligence (AI) technologies on society. A significant challenge in the design of ethical AI systems is that there are multiple stakeholders in the AI pipeline, each with their own set of constraints and interests. This position paper outlines some potential ways in which generative artworks can play this role by serving as accessible and powerful educational tools.
arXiv Detail & Related papers (2021-06-25T22:31:55Z)
An interdisciplinary conceptual study of Artificial Intelligence (AI) for helping benefit-risk assessment practices: Towards a comprehensive qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence. The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z)
A general framework for scientifically inspired explanations in AI [76.48625630211943]
We instantiate the concept of structure of scientific explanation as the theoretical underpinning for a general framework in which explanations for AI systems can be implemented. This framework aims to provide the tools to build a "mental-model" of any AI system so that the interaction with the user can provide information on demand and be closer to the nature of human-made explanations.
arXiv Detail & Related papers (2020-03-02T10:32:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.