MEME: Generating RNN Model Explanations via Model Extraction
- URL: http://arxiv.org/abs/2012.06954v1
- Date: Sun, 13 Dec 2020 04:00:08 GMT
- Title: MEME: Generating RNN Model Explanations via Model Extraction
- Authors: Dmitry Kazhdan, Botty Dimanov, Mateja Jamnik, Pietro Li\`o
- Abstract summary: MEME is a model extraction approach capable of approximating RNNs with interpretable models represented by human-understandable concepts and their interactions.
We show how MEME can be used to interpret RNNs both locally and globally, by approximating RNN decision-making via interpretable concept interactions.
- Score: 6.55705721360334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent Neural Networks (RNNs) have achieved remarkable performance on a
range of tasks. A key step to further empowering RNN-based approaches is
improving their explainability and interpretability. In this work we present
MEME: a model extraction approach capable of approximating RNNs with
interpretable models represented by human-understandable concepts and their
interactions. We demonstrate how MEME can be applied to two multivariate,
continuous data case studies: Room Occupation Prediction, and In-Hospital
Mortality Prediction. Using these case-studies, we show how our extracted
models can be used to interpret RNNs both locally and globally, by
approximating RNN decision-making via interpretable concept interactions.
Related papers
- DeepCover: Advancing RNN Test Coverage and Online Error Prediction using
State Machine Extraction [0.0]
Recurrent neural networks (RNNs) have emerged as powerful tools for processing sequential data in various fields, including natural language processing and speech recognition.
The lack of explainability in RNN models has limited their interpretability, posing challenges in understanding their internal workings.
This paper proposes a methodology for extracting a state machine (SM) from an RNN-based model to provide insights into its internal function.
arXiv Detail & Related papers (2024-02-10T14:45:23Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Episodic Memory Theory for the Mechanistic Interpretation of Recurrent
Neural Networks [3.683202928838613]
We propose the Episodic Memory Theory (EMT), illustrating that RNNs can be conceptualized as discrete-time analogs of the recently proposed General Sequential Episodic Memory Model.
We introduce a novel set of algorithmic tasks tailored to probe the variable binding behavior in RNNs.
Our empirical investigations reveal that trained RNNs consistently converge to the variable binding circuit, thus indicating universality in the dynamics of RNNs.
arXiv Detail & Related papers (2023-10-03T20:52:37Z) - On Neural Networks as Infinite Tree-Structured Probabilistic Graphical Models [44.676210493587256]
We propose an innovative solution by constructing infinite tree-structured PGMs that correspond exactly to neural networks.
Our research reveals that DNNs, during forward propagation, indeed perform approximations of PGM inference that are precise in this alternative PGM structure.
arXiv Detail & Related papers (2023-05-27T21:32:28Z) - Transferability of coVariance Neural Networks and Application to
Interpretable Brain Age Prediction using Anatomical Features [119.45320143101381]
Graph convolutional networks (GCN) leverage topology-driven graph convolutional operations to combine information across the graph for inference tasks.
We have studied GCNs with covariance matrices as graphs in the form of coVariance neural networks (VNNs)
VNNs inherit the scale-free data processing architecture from GCNs and here, we show that VNNs exhibit transferability of performance over datasets whose covariance matrices converge to a limit object.
arXiv Detail & Related papers (2023-05-02T22:15:54Z) - Neural Additive Models for Location Scale and Shape: A Framework for
Interpretable Neural Regression Beyond the Mean [1.0923877073891446]
Deep neural networks (DNNs) have proven to be highly effective in a variety of tasks.
Despite this success, the inner workings of DNNs are often not transparent.
This lack of interpretability has led to increased research on inherently interpretable neural networks.
arXiv Detail & Related papers (2023-01-27T17:06:13Z) - Batch-Ensemble Stochastic Neural Networks for Out-of-Distribution
Detection [55.028065567756066]
Out-of-distribution (OOD) detection has recently received much attention from the machine learning community due to its importance in deploying machine learning models in real-world applications.
In this paper we propose an uncertainty quantification approach by modelling the distribution of features.
We incorporate an efficient ensemble mechanism, namely batch-ensemble, to construct the batch-ensemble neural networks (BE-SNNs) and overcome the feature collapse problem.
We show that BE-SNNs yield superior performance on several OOD benchmarks, such as the Two-Moons dataset, the FashionMNIST vs MNIST dataset, FashionM
arXiv Detail & Related papers (2022-06-26T16:00:22Z) - EINNs: Epidemiologically-Informed Neural Networks [75.34199997857341]
We introduce a new class of physics-informed neural networks-EINN-crafted for epidemic forecasting.
We investigate how to leverage both the theoretical flexibility provided by mechanistic models as well as the data-driven expressability afforded by AI models.
arXiv Detail & Related papers (2022-02-21T18:59:03Z) - Now You See Me (CME): Concept-based Model Extraction [24.320487188704146]
Deep Neural Networks (DNNs) have achieved remarkable performance on a range of tasks.
Key step to further empowering DNN-based approaches is improving their explainability.
We present CME: a concept-based model extraction framework.
arXiv Detail & Related papers (2020-10-25T22:03:45Z) - Explaining and Improving Model Behavior with k Nearest Neighbor
Representations [107.24850861390196]
We propose using k nearest neighbor representations to identify training examples responsible for a model's predictions.
We show that kNN representations are effective at uncovering learned spurious associations.
Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
arXiv Detail & Related papers (2020-10-18T16:55:25Z) - Interpreting Graph Neural Networks for NLP With Differentiable Edge
Masking [63.49779304362376]
Graph neural networks (GNNs) have become a popular approach to integrating structural inductive biases into NLP models.
We introduce a post-hoc method for interpreting the predictions of GNNs which identifies unnecessary edges.
We show that we can drop a large proportion of edges without deteriorating the performance of the model.
arXiv Detail & Related papers (2020-10-01T17:51:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.