Related papers: Distilling neural networks into skipgram-level decision lists

Distilling neural networks into skipgram-level decision lists

URL: http://arxiv.org/abs/2005.07111v2
Date: Mon, 18 May 2020 08:43:42 GMT
Title: Distilling neural networks into skipgram-level decision lists
Authors: Madhumita Sushil and Simon \v{S}uster and Walter Daelemans
Abstract summary: We propose a pipeline to explain RNNs by means of decision lists (also called rules) over skipgrams. We find that our technique persistently achieves high explanation fidelity and qualitatively interpretable rules.
Score: 4.109840601429086
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Several previous studies on explanation for recurrent neural networks focus on approaches that find the most important input segments for a network as its explanations. In that case, the manner in which these input segments combine with each other to form an explanatory pattern remains unknown. To overcome this, some previous work tries to find patterns (called rules) in the data that explain neural outputs. However, their explanations are often insensitive to model parameters, which limits the scalability of text explanations. To overcome these limitations, we propose a pipeline to explain RNNs by means of decision lists (also called rules) over skipgrams. For evaluation of explanations, we create a synthetic sepsis-identification dataset, as well as apply our technique on additional clinical and sentiment analysis datasets. We find that our technique persistently achieves high explanation fidelity and qualitatively interpretable rules.

Related papers

Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations [54.17275171325324]
We present a counterexample to the Linear Representation Hypothesis (LRH) When trained to repeat an input token sequence, neural networks learn to represent the token at each position with a particular order of magnitude, rather than a direction. These findings strongly indicate that interpretability research should not be confined to the LRH.
arXiv Detail & Related papers (2024-08-20T15:04:37Z)
Relational Composition in Neural Networks: A Survey and Call to Action [54.47858085003077]
Many neural nets appear to represent data as linear combinations of "feature vectors" We argue that this success is incomplete without an understanding of relational composition.
arXiv Detail & Related papers (2024-07-19T20:50:57Z)
Explaining Explainability: Towards Deeper Actionable Insights into Deep Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level. We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z)
NeuroExplainer: Fine-Grained Attention Decoding to Uncover Cortical Development Patterns of Preterm Infants [73.85768093666582]
We propose an explainable geometric deep network dubbed NeuroExplainer. NeuroExplainer is used to uncover altered infant cortical development patterns associated with preterm birth.
arXiv Detail & Related papers (2023-01-01T12:48:12Z)
Robust Explanation Constraints for Neural Networks [33.14373978947437]
Post-hoc explanation methods used with the intent of neural networks are sometimes said to help engender trust in their outputs. Our training method is the only method able to learn neural networks with insights about robustness tested across all six tested networks.
arXiv Detail & Related papers (2022-12-16T14:40:25Z)
Interpretable Fault Diagnosis of Rolling Element Bearings with Temporal Logic Neural Network [11.830457329372283]
This paper proposes a novel neural network structure, called temporal logic neural network (TLNN) TLNN keeps the nice properties of traditional neuron networks but also provides a formal interpretation of itself with formal language.
arXiv Detail & Related papers (2022-04-15T11:54:30Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Combining Sub-Symbolic and Symbolic Methods for Explainability [1.3777144060953146]
A number of sub-symbolic approaches have been developed to provide insights into the GNN decision making process. These are first important steps on the way to explainability, but the generated explanations are often hard to understand for users that are not AI experts. We introduce a conceptual approach combining sub-symbolic and symbolic methods for human-centric explanations.
arXiv Detail & Related papers (2021-12-03T10:57:00Z)
A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts [38.215184251799194]
We propose a framework (VRX) to interpret classification NNs with intuitive structural visual concepts. By means of knowledge distillation, we show VRX can take a step towards mimicking the reasoning process of NNs.
arXiv Detail & Related papers (2021-05-01T15:47:42Z)
Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches. The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data. The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z)
Explaining Away Attacks Against Neural Networks [3.658164271285286]
We investigate the problem of identifying adversarial attacks on image-based neural networks. We present intriguing experimental results showing significant discrepancies between the explanations generated for the predictions of a model on clean and adversarial data. We propose a framework which can identify whether a given input is adversarial based on the explanations given by the model.
arXiv Detail & Related papers (2020-03-06T15:32:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.