Position Paper: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience
- URL: http://arxiv.org/abs/2406.01352v1
- Date: Mon, 3 Jun 2024 14:16:56 GMT
- Title: Position Paper: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience
- Authors: Martina G. Vilas, Federico Adolfi, David Poeppel, Gemma Roig,
- Abstract summary: Inner Interpretability is a promising field tasked with uncovering the inner mechanisms of AI systems.
Recent critiques raise issues that question its usefulness to advance the broader goals of AI.
Here we draw the relevant connections and highlight lessons that can be transferred productively between fields.
- Score: 4.524832437237367
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inner Interpretability is a promising emerging field tasked with uncovering the inner mechanisms of AI systems, though how to develop these mechanistic theories is still much debated. Moreover, recent critiques raise issues that question its usefulness to advance the broader goals of AI. However, it has been overlooked that these issues resemble those that have been grappled with in another field: Cognitive Neuroscience. Here we draw the relevant connections and highlight lessons that can be transferred productively between fields. Based on these, we propose a general conceptual framework and give concrete methodological strategies for building mechanistic explanations in AI inner interpretability research. With this conceptual framework, Inner Interpretability can fend off critiques and position itself on a productive path to explain AI systems.
Related papers
- Metacognitive AI: Framework and the Case for a Neurosymbolic Approach [5.5441283041944]
We introduce a framework for understanding metacognitive artificial intelligence (AI) that we call TRAP: transparency, reasoning, adaptation, and perception.
We discuss each of these aspects in-turn and explore how neurosymbolic AI (NSAI) can be leveraged to address challenges of metacognition.
arXiv Detail & Related papers (2024-06-17T23:30:46Z) - Mechanistic Interpretability for AI Safety -- A Review [28.427951836334188]
This review explores mechanistic interpretability.
Mechanistic interpretability could help prevent catastrophic outcomes as AI systems become more powerful and inscrutable.
arXiv Detail & Related papers (2024-04-22T11:01:51Z) - Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
We emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions.
In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model.
arXiv Detail & Related papers (2024-02-28T16:09:56Z) - Opening the Black-Box: A Systematic Review on Explainable AI in Remote
Sensing [52.110707276938]
Black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in Remote Sensing.
We perform a systematic review to identify the key trends of how explainable AI is used in Remote Sensing.
We shed light on novel explainable AI approaches and emerging directions that tackle specific Remote Sensing challenges.
arXiv Detail & Related papers (2024-02-21T13:19:58Z) - Emergent Explainability: Adding a causal chain to neural network
inference [0.0]
This position paper presents a theoretical framework for enhancing explainable artificial intelligence (xAI) through emergent communication (EmCom)
We explore the novel integration of EmCom into AI systems, offering a paradigm shift from conventional associative relationships between inputs and outputs to a more nuanced, causal interpretation.
The paper discusses the theoretical underpinnings of this approach, its potential broad applications, and its alignment with the growing need for responsible and transparent AI systems.
arXiv Detail & Related papers (2024-01-29T02:28:39Z) - Building Bridges: Generative Artworks to Explore AI Ethics [56.058588908294446]
In recent years, there has been an increased emphasis on understanding and mitigating adverse impacts of artificial intelligence (AI) technologies on society.
A significant challenge in the design of ethical AI systems is that there are multiple stakeholders in the AI pipeline, each with their own set of constraints and interests.
This position paper outlines some potential ways in which generative artworks can play this role by serving as accessible and powerful educational tools.
arXiv Detail & Related papers (2021-06-25T22:31:55Z) - An interdisciplinary conceptual study of Artificial Intelligence (AI)
for helping benefit-risk assessment practices: Towards a comprehensive
qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence.
The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z) - Thinking Fast and Slow in AI [38.8581204791644]
This paper proposes a research direction to advance AI which draws inspiration from cognitive theories of human decision making.
The premise is that if we gain insights about the causes of some human capabilities that are still lacking in AI, we may obtain similar capabilities in an AI system.
arXiv Detail & Related papers (2020-10-12T20:10:05Z) - Neuro-symbolic Architectures for Context Understanding [59.899606495602406]
We propose the use of hybrid AI methodology as a framework for combining the strengths of data-driven and knowledge-driven approaches.
Specifically, we inherit the concept of neuro-symbolism as a way of using knowledge-bases to guide the learning progress of deep neural networks.
arXiv Detail & Related papers (2020-03-09T15:04:07Z) - A general framework for scientifically inspired explanations in AI [76.48625630211943]
We instantiate the concept of structure of scientific explanation as the theoretical underpinning for a general framework in which explanations for AI systems can be implemented.
This framework aims to provide the tools to build a "mental-model" of any AI system so that the interaction with the user can provide information on demand and be closer to the nature of human-made explanations.
arXiv Detail & Related papers (2020-03-02T10:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.