Related papers: A Comprehensive Survey on Self-Interpretable Neural Networks

A Comprehensive Survey on Self-Interpretable Neural Networks

URL: http://arxiv.org/abs/2501.15638v1
Date: Sun, 26 Jan 2025 18:50:16 GMT
Title: A Comprehensive Survey on Self-Interpretable Neural Networks
Authors: Yang Ji, Ying Sun, Yuting Zhang, Zhigaoyuan Wang, Yuanxin Zhuang, Zheng Gong, Dazhong Shen, Chuan Qin, Hengshu Zhu, Hui Xiong,
Abstract summary: Self-interpretable neural networks inherently reveal the prediction rationale through the model structures.<n>We first collect and review existing works on self-interpretable neural networks and provide a structured summary of their methodologies.<n>We also present concrete, visualized examples of model explanations and discuss their applicability across diverse scenarios.
Score: 36.0575431131253
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural networks have achieved remarkable success across various fields. However, the lack of interpretability limits their practical use, particularly in critical decision-making scenarios. Post-hoc interpretability, which provides explanations for pre-trained models, is often at risk of robustness and fidelity. This has inspired a rising interest in self-interpretable neural networks, which inherently reveal the prediction rationale through the model structures. Although there exist surveys on post-hoc interpretability, a comprehensive and systematic survey of self-interpretable neural networks is still missing. To address this gap, we first collect and review existing works on self-interpretable neural networks and provide a structured summary of their methodologies from five key perspectives: attribution-based, function-based, concept-based, prototype-based, and rule-based self-interpretation. We also present concrete, visualized examples of model explanations and discuss their applicability across diverse scenarios, including image, text, graph data, and deep reinforcement learning. Additionally, we summarize existing evaluation metrics for self-interpretability and identify open challenges in this field, offering insights for future research. To support ongoing developments, we present a publicly accessible resource to track advancements in this domain: https://github.com/yangji721/Awesome-Self-Interpretable-Neural-Network.

Related papers

Concept-Based Mechanistic Interpretability Using Structured Knowledge Graphs [3.429783703166407]
Our framework enables a global dissection of model behavior by analyzing how high-level semantic attributes emerge, interact, and propagate through internal model components.<n>A key innovation is our visualization platform that we named BAGEL, which presents these insights in a structured knowledge graph.<n>Our framework is model-agnostic, scalable, and contributes to a deeper understanding of how deep learning models generalize (or fail to) in the presence of dataset biases.
arXiv Detail & Related papers (2025-07-08T09:30:20Z)
Concept-Guided Interpretability via Neural Chunking [54.73787666584143]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract these emerging entities, complementing each other based on label availability and dimensionality.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z)
Explaining Deep Neural Networks by Leveraging Intrinsic Methods [0.9790236766474201]
This thesis contributes to the field of eXplainable AI, focusing on enhancing the interpretability of deep neural networks. The core contributions lie in introducing novel techniques aimed at making these networks more interpretable by leveraging an analysis of their inner workings. Secondly, this research delves into novel investigations on neurons within trained deep neural networks, shedding light on overlooked phenomena related to their activation values.
arXiv Detail & Related papers (2024-07-17T01:20:17Z)
InterpretCC: Intrinsic User-Centric Interpretability through Global Mixture of Experts [31.738009841932374]
Interpretability for neural networks is a trade-off between three key requirements. We present InterpretCC, a family of interpretable-by-design neural networks that guarantee human-centric interpretability.
arXiv Detail & Related papers (2024-02-05T11:55:50Z)
What Do Deep Saliency Models Learn about Visual Attention? [28.023464783469738]
We present a novel analytic framework that sheds light on the implicit features learned by saliency models. Our approach decomposes these implicit features into interpretable bases that are explicitly aligned with semantic attributes.
arXiv Detail & Related papers (2023-10-14T23:15:57Z)
NxPlain: Web-based Tool for Discovery of Latent Concepts [16.446370662629555]
We present NxPlain, a web application that provides an explanation of a model's prediction using latent concepts. NxPlain discovers latent concepts learned in a deep NLP model, provides an interpretation of the knowledge learned in the model, and explains its predictions based on the used concepts.
arXiv Detail & Related papers (2023-03-06T10:45:24Z)
Mapping Knowledge Representations to Concepts: A Review and New Perspectives [0.6875312133832078]
This review focuses on research that aims to associate internal representations with human understandable concepts. We find this taxonomy and theories of causality, useful for understanding what can be expected, and not expected, from neural network explanations. The analysis additionally uncovers an ambiguity in the reviewed literature related to the goal of model explainability.
arXiv Detail & Related papers (2022-12-31T12:56:12Z)
Interpretable Self-Aware Neural Networks for Robust Trajectory Prediction [50.79827516897913]
We introduce an interpretable paradigm for trajectory prediction that distributes the uncertainty among semantic concepts. We validate our approach on real-world autonomous driving data, demonstrating superior performance over state-of-the-art baselines.
arXiv Detail & Related papers (2022-11-16T06:28:20Z)
Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks [4.153804257347222]
We present Agglomerator, a framework capable of providing a representation of part-whole hierarchies from visual cues. We evaluate our method on common datasets, such as SmallNORB, MNIST, FashionMNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-03-07T10:56:13Z)
Interpretable Social Anchors for Human Trajectory Forecasting in Crowds [84.20437268671733]
We propose a neural network-based system to predict human trajectory in crowds. We learn interpretable rule-based intents, and then utilise the expressibility of neural networks to model scene-specific residual. Our architecture is tested on the interaction-centric benchmark TrajNet++.
arXiv Detail & Related papers (2021-05-07T09:22:34Z)
Interpretable Deep Learning: Interpretations, Interpretability, Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused. We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy. We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z)
Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples. We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
Generative Counterfactuals for Neural Networks via Attribute-Informed Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP) By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently. Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z)
Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches. The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data. The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.