Propositional Interpretability in Artificial Intelligence
- URL: http://arxiv.org/abs/2501.15740v1
- Date: Mon, 27 Jan 2025 03:06:06 GMT
- Title: Propositional Interpretability in Artificial Intelligence
- Authors: David J. Chalmers,
- Abstract summary: I argue for the importance of propositional interpretability, which involves interpreting a system's mechanisms and behavior in terms of propositional attitudes.
A central challenge is what I call thought logging: creating systems that log all of the relevant propositional attitudes in an AI system over time.
- Score: 0.0
- License:
- Abstract: Mechanistic interpretability is the program of explaining what AI systems are doing in terms of their internal mechanisms. I analyze some aspects of the program, along with setting out some concrete challenges and assessing progress to date. I argue for the importance of propositional interpretability, which involves interpreting a system's mechanisms and behavior in terms of propositional attitudes: attitudes (such as belief, desire, or subjective probability) to propositions (e.g. the proposition that it is hot outside). Propositional attitudes are the central way that we interpret and explain human beings and they are likely to be central in AI too. A central challenge is what I call thought logging: creating systems that log all of the relevant propositional attitudes in an AI system over time. I examine currently popular methods of interpretability (such as probing, sparse auto-encoders, and chain of thought methods) as well as philosophical methods of interpretation (including those grounded in psychosemantics) to assess their strengths and weaknesses as methods of propositional interpretability.
Related papers
- Imagining and building wise machines: The centrality of AI metacognition [78.76893632793497]
We argue that shortcomings stem from one overarching failure: AI systems lack wisdom.
While AI research has focused on task-level strategies, metacognition is underdeveloped in AI systems.
We propose that integrating metacognitive capabilities into AI systems is crucial for enhancing their robustness, explainability, cooperation, and safety.
arXiv Detail & Related papers (2024-11-04T18:10:10Z) - Position: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience [4.524832437237367]
Inner Interpretability is a promising field tasked with uncovering the inner mechanisms of AI systems.
Recent critiques raise issues that question its usefulness to advance the broader goals of AI.
Here we draw the relevant connections and highlight lessons that can be transferred productively between fields.
arXiv Detail & Related papers (2024-06-03T14:16:56Z) - The Language Labyrinth: Constructive Critique on the Terminology Used in
the AI Discourse [0.0]
This paper claims, that AI debates are still characterised by a lack of critical distance to metaphors like 'training', 'learning' or 'deciding'
As consequence, reflections regarding responsibility or potential use-cases are greatly distorted.
It is a conceptual work at the intersection of critical computer science and philosophy of language.
arXiv Detail & Related papers (2023-07-18T14:32:21Z) - Circumventing interpretability: How to defeat mind-readers [0.0]
misaligned artificial intelligence will have a convergent instrumental incentive to make its thoughts difficult for us to interpret.
I discuss many ways that a capable AI might circumvent scalable interpretability methods and suggest a framework for thinking about these potential future risks.
arXiv Detail & Related papers (2022-12-21T23:52:42Z) - Metaethical Perspectives on 'Benchmarking' AI Ethics [81.65697003067841]
Benchmarks are seen as the cornerstone for measuring technical progress in Artificial Intelligence (AI) research.
An increasingly prominent research area in AI is ethics, which currently has no set of benchmarks nor commonly accepted way for measuring the 'ethicality' of an AI system.
We argue that it makes more sense to talk about 'values' rather than 'ethics' when considering the possible actions of present and future AI systems.
arXiv Detail & Related papers (2022-04-11T14:36:39Z) - Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations.
It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z) - An interdisciplinary conceptual study of Artificial Intelligence (AI)
for helping benefit-risk assessment practices: Towards a comprehensive
qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence.
The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z) - LioNets: A Neural-Specific Local Interpretation Technique Exploiting
Penultimate Layer Information [6.570220157893279]
Interpretable machine learning (IML) is an urgent topic of research.
This paper focuses on a local-based, neural-specific interpretation process applied to textual and time-series data.
arXiv Detail & Related papers (2021-04-13T09:39:33Z) - Argument Schemes and Dialogue for Explainable Planning [3.2741749231824904]
We propose an argument scheme-based approach to provide explanations in the domain of AI planning.
We present novel argument schemes to create arguments that explain a plan and its key elements.
We also present a novel dialogue system using the argument schemes and critical questions for providing interactive dialectical explanations.
arXiv Detail & Related papers (2021-01-07T17:43:12Z) - Machine Common Sense [77.34726150561087]
Machine common sense remains a broad, potentially unbounded problem in artificial intelligence (AI)
This article deals with the aspects of modeling commonsense reasoning focusing on such domain as interpersonal interactions.
arXiv Detail & Related papers (2020-06-15T13:59:47Z) - A general framework for scientifically inspired explanations in AI [76.48625630211943]
We instantiate the concept of structure of scientific explanation as the theoretical underpinning for a general framework in which explanations for AI systems can be implemented.
This framework aims to provide the tools to build a "mental-model" of any AI system so that the interaction with the user can provide information on demand and be closer to the nature of human-made explanations.
arXiv Detail & Related papers (2020-03-02T10:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.