Mechanistic understanding and validation of large AI models with SemanticLens
- URL: http://arxiv.org/abs/2501.05398v1
- Date: Thu, 09 Jan 2025 17:47:34 GMT
- Title: Mechanistic understanding and validation of large AI models with SemanticLens
- Authors: Maximilian Dreyer, Jim Berend, Tobias Labarta, Johanna Vielhaben, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek,
- Abstract summary: Unlike human-engineered systems such as aeroplanes, the inner workings of AI models remain largely opaque.
This paper introduces SemanticLens, a universal explanation method for neural networks that maps hidden knowledge encoded by components.
- Score: 13.712668314238082
- License:
- Abstract: Unlike human-engineered systems such as aeroplanes, where each component's role and dependencies are well understood, the inner workings of AI models remain largely opaque, hindering verifiability and undermining trust. This paper introduces SemanticLens, a universal explanation method for neural networks that maps hidden knowledge encoded by components (e.g., individual neurons) into the semantically structured, multimodal space of a foundation model such as CLIP. In this space, unique operations become possible, including (i) textual search to identify neurons encoding specific concepts, (ii) systematic analysis and comparison of model representations, (iii) automated labelling of neurons and explanation of their functional roles, and (iv) audits to validate decision-making against requirements. Fully scalable and operating without human input, SemanticLens is shown to be effective for debugging and validation, summarizing model knowledge, aligning reasoning with expectations (e.g., adherence to the ABCDE-rule in melanoma classification), and detecting components tied to spurious correlations and their associated training data. By enabling component-level understanding and validation, the proposed approach helps bridge the "trust gap" between AI models and traditional engineered systems. We provide code for SemanticLens on https://github.com/jim-berend/semanticlens and a demo on https://semanticlens.hhi-research-insights.eu.
Related papers
- Interpretable Concept-based Deep Learning Framework for Multimodal Human Behavior Modeling [5.954573238057435]
EU General Data Protection Regulation requires any high-risk AI systems to be sufficiently interpretable.
Existing explainable methods often compromise between interpretability and performance.
We propose a novel and generalizable framework, namely the Attention-Guided Concept Model (AGCM)
AGCM provides learnable conceptual explanations by identifying what concepts that lead to the predictions and where they are observed.
arXiv Detail & Related papers (2025-02-14T13:15:21Z) - Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities.
We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities.
We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z) - QIXAI: A Quantum-Inspired Framework for Enhancing Classical and Quantum Model Transparency and Understanding [0.0]
Deep learning models are often hindered by their lack of interpretability, rendering them "black boxes"
This paper introduces the QIXAI Framework, a novel approach for enhancing neural network interpretability through quantum-inspired techniques.
The framework applies to both quantum and classical systems, demonstrating its potential to improve interpretability and transparency across a range of models.
arXiv Detail & Related papers (2024-10-21T21:55:09Z) - Neurosymbolic AI approach to Attribution in Large Language Models [5.3454230926797734]
Neurosymbolic AI (NesyAI) combines the strengths of neural networks with structured symbolic reasoning.
This paper explores how NesyAI frameworks can enhance existing attribution models, offering more reliable, interpretable, and adaptable systems.
arXiv Detail & Related papers (2024-09-30T02:20:36Z) - Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents [55.63497537202751]
Article explores the convergence of connectionist and symbolic artificial intelligence (AI)
Traditionally, connectionist AI focuses on neural networks, while symbolic AI emphasizes symbolic representation and logic.
Recent advancements in large language models (LLMs) highlight the potential of connectionist architectures in handling human language as a form of symbols.
arXiv Detail & Related papers (2024-07-11T14:00:53Z) - Neuro-Symbolic Artificial Intelligence (AI) for Intent based Semantic
Communication [85.06664206117088]
6G networks must consider semantics and effectiveness (at end-user) of the data transmission.
NeSy AI is proposed as a pillar for learning causal structure behind the observed data.
GFlowNet is leveraged for the first time in a wireless system to learn the probabilistic structure which generates the data.
arXiv Detail & Related papers (2022-05-22T07:11:57Z) - Interpretable part-whole hierarchies and conceptual-semantic
relationships in neural networks [4.153804257347222]
We present Agglomerator, a framework capable of providing a representation of part-whole hierarchies from visual cues.
We evaluate our method on common datasets, such as SmallNORB, MNIST, FashionMNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-03-07T10:56:13Z) - LAP: An Attention-Based Module for Concept Based Self-Interpretation and
Knowledge Injection in Convolutional Neural Networks [2.8948274245812327]
We propose a new attention-based pooling layer, called Local Attention Pooling (LAP), that accomplishes self-interpretability.
LAP is easily pluggable into any convolutional neural network, even the already trained ones.
LAP offers more valid human-understandable and faithful-to-the-model interpretations than the commonly used white-box explainer methods.
arXiv Detail & Related papers (2022-01-27T21:10:20Z) - A Minimalist Dataset for Systematic Generalization of Perception,
Syntax, and Semantics [131.93113552146195]
We present a new dataset, Handwritten arithmetic with INTegers (HINT), to examine machines' capability of learning generalizable concepts.
In HINT, machines are tasked with learning how concepts are perceived from raw signals such as images.
We undertake extensive experiments with various sequence-to-sequence models, including RNNs, Transformers, and GPT-3.
arXiv Detail & Related papers (2021-03-02T01:32:54Z) - Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions.
We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.