Contrastive Sparse Autoencoders for Interpreting Planning of Chess-Playing Agents
- URL: http://arxiv.org/abs/2406.04028v1
- Date: Thu, 6 Jun 2024 12:57:31 GMT
- Title: Contrastive Sparse Autoencoders for Interpreting Planning of Chess-Playing Agents
- Authors: Yoann Poupart,
- Abstract summary: We propose contrastive sparse autoencoders (CSAE) for studying pairs of game trajectories.
Using CSAE, we are able to extract and interpret concepts that are meaningful to the chess-agent plans.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI led chess systems to a superhuman level, yet these systems heavily rely on black-box algorithms. This is unsustainable in ensuring transparency to the end-user, particularly when these systems are responsible for sensitive decision-making. Recent interpretability work has shown that the inner representations of Deep Neural Networks (DNNs) were fathomable and contained human-understandable concepts. Yet, these methods are seldom contextualised and are often based on a single hidden state, which makes them unable to interpret multi-step reasoning, e.g. planning. In this respect, we propose contrastive sparse autoencoders (CSAE), a novel framework for studying pairs of game trajectories. Using CSAE, we are able to extract and interpret concepts that are meaningful to the chess-agent plans. We primarily focused on a qualitative analysis of the CSAE features before proposing an automated feature taxonomy. Furthermore, to evaluate the quality of our trained CSAE, we devise sanity checks to wave spurious correlations in our results.
Related papers
- Feature CAM: Interpretable AI in Image Classification [2.4409988934338767]
There is a lack of trust to use Artificial Intelligence in critical and high-precision fields such as security, finance, health, and manufacturing industries.
We introduce a novel technique Feature CAM, which falls in the perturbation-activation combination, to create fine-grained, class-discriminative visualizations.
The resulting saliency maps proved to be 3-4 times better human interpretable than the state-of-the-art in ABM.
arXiv Detail & Related papers (2024-03-08T20:16:00Z) - Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales [54.78115855552886]
We show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture.
With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner.
For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
arXiv Detail & Related papers (2024-02-23T16:50:07Z) - Mathematical Algorithm Design for Deep Learning under Societal and
Judicial Constraints: The Algorithmic Transparency Requirement [65.26723285209853]
We derive a framework to analyze whether a transparent implementation in a computing model is feasible.
Based on previous results, we find that Blum-Shub-Smale Machines have the potential to establish trustworthy solvers for inverse problems.
arXiv Detail & Related papers (2024-01-18T15:32:38Z) - AS-XAI: Self-supervised Automatic Semantic Interpretation for CNN [5.42467030980398]
We propose a self-supervised automatic semantic interpretable artificial intelligence (AS-XAI) framework.
It utilizes transparent embedding semantic extraction spaces and row-centered principal component analysis (PCA) for global semantic interpretation of model decisions.
The proposed approach offers broad fine-grained practical applications, including shared semantic interpretation under out-of-distribution categories.
arXiv Detail & Related papers (2023-12-02T10:06:54Z) - Representation Engineering: A Top-Down Approach to AI Transparency [132.0398250233924]
We identify and characterize the emerging area of representation engineering (RepE)
RepE places population-level representations, rather than neurons or circuits, at the center of analysis.
We showcase how these methods can provide traction on a wide range of safety-relevant problems.
arXiv Detail & Related papers (2023-10-02T17:59:07Z) - ShadowNet for Data-Centric Quantum System Learning [188.683909185536]
We propose a data-centric learning paradigm combining the strength of neural-network protocols and classical shadows.
Capitalizing on the generalization power of neural networks, this paradigm can be trained offline and excel at predicting previously unseen systems.
We present the instantiation of our paradigm in quantum state tomography and direct fidelity estimation tasks and conduct numerical analysis up to 60 qubits.
arXiv Detail & Related papers (2023-08-22T09:11:53Z) - The ConceptARC Benchmark: Evaluating Understanding and Generalization in
the ARC Domain [0.0]
We describe an in-depth evaluation benchmark for the Abstraction and Reasoning Corpus (ARC)
In particular, we describe ConceptARC, a new, publicly available benchmark in the ARC domain.
We report results on testing humans on this benchmark as well as three machine solvers.
arXiv Detail & Related papers (2023-05-11T21:06:39Z) - Interpretable Self-Aware Neural Networks for Robust Trajectory
Prediction [50.79827516897913]
We introduce an interpretable paradigm for trajectory prediction that distributes the uncertainty among semantic concepts.
We validate our approach on real-world autonomous driving data, demonstrating superior performance over state-of-the-art baselines.
arXiv Detail & Related papers (2022-11-16T06:28:20Z) - Interpretable part-whole hierarchies and conceptual-semantic
relationships in neural networks [4.153804257347222]
We present Agglomerator, a framework capable of providing a representation of part-whole hierarchies from visual cues.
We evaluate our method on common datasets, such as SmallNORB, MNIST, FashionMNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-03-07T10:56:13Z) - Evaluating Explainable Artificial Intelligence Methods for Multi-label
Deep Learning Classification Tasks in Remote Sensing [0.0]
We develop deep learning models with state-of-the-art performance in benchmark datasets.
Ten XAI methods were employed towards understanding and interpreting models' predictions.
Occlusion, Grad-CAM and Lime were the most interpretable and reliable XAI methods.
arXiv Detail & Related papers (2021-04-03T11:13:14Z) - Bias in Multimodal AI: Testbed for Fair Automatic Recruitment [73.85525896663371]
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
We train automatic recruitment algorithms using a set of multimodal synthetic profiles consciously scored with gender and racial biases.
Our methodology and results show how to generate fairer AI-based tools in general, and in particular fairer automated recruitment systems.
arXiv Detail & Related papers (2020-04-15T15:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.