Interpretability of Blackbox Machine Learning Models through Dataview
Extraction and Shadow Model creation
- URL: http://arxiv.org/abs/2002.00372v1
- Date: Sun, 2 Feb 2020 11:47:15 GMT
- Title: Interpretability of Blackbox Machine Learning Models through Dataview
Extraction and Shadow Model creation
- Authors: Rupam Patir, Shubham Singhal, C. Anantaram, Vikram Goyal
- Abstract summary: Different deep learning models built on the same training data may capture different views of the data based on the underlying techniques used.
For explaining the decisions arrived by blackbox deep learning models, we argue that it is essential to reproduce that model's view of the training data faithfully.
- Score: 4.456941846147708
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models trained using massive amounts of data tend to capture
one view of the data and its associated mapping. Different deep learning models
built on the same training data may capture different views of the data based
on the underlying techniques used. For explaining the decisions arrived by
blackbox deep learning models, we argue that it is essential to reproduce that
model's view of the training data faithfully. This faithful reproduction can
then be used for explanation generation. We investigate two methods for data
view extraction: hill-climbing approach and a GAN-driven approach. We then use
this synthesized data for creating shadow models for explanation generation:
Decision-Tree model and Formal Concept Analysis based model. We evaluate these
approaches on a Blackbox model trained on public datasets and show its
usefulness in explanation generation.
Related papers
- The Journey, Not the Destination: How Data Guides Diffusion Models [75.19694584942623]
Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity.
We propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions.
arXiv Detail & Related papers (2023-12-11T08:39:43Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Wrapper Boxes: Faithful Attribution of Model Predictions to Training Data [40.7542543934205]
We propose a "wrapper box'' pipeline: training a neural model as usual and then using its learned feature representation in classic, interpretable models to perform prediction.
Across seven language models of varying sizes, we first show that the predictive performance of wrapper classic models is largely comparable to the original neural models.
Our pipeline thus preserves the predictive performance of neural language models while faithfully attributing classic model decisions to training data.
arXiv Detail & Related papers (2023-11-15T01:50:53Z) - Machine Unlearning Methodology base on Stochastic Teacher Network [33.763901254862766]
"Right to be forgotten" grants data owners the right to actively withdraw data that has been used for model training.
Existing machine unlearning methods have been found to be ineffective in quickly removing knowledge from deep learning models.
This paper proposes using a network as a teacher to expedite the mitigation of the influence caused by forgotten data on the model.
arXiv Detail & Related papers (2023-08-28T06:05:23Z) - LLM2Loss: Leveraging Language Models for Explainable Model Diagnostics [5.33024001730262]
We propose an approach that can provide semantic insights into a model's patterns of failures and biases.
We show that an ensemble of such lightweight models can be used to generate insights on the performance of the black-box model.
arXiv Detail & Related papers (2023-05-04T23:54:37Z) - TRAK: Attributing Model Behavior at Scale [79.56020040993947]
We present TRAK (Tracing with Randomly-trained After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differenti models.
arXiv Detail & Related papers (2023-03-24T17:56:22Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Visualising Deep Network's Time-Series Representations [93.73198973454944]
Despite the popularisation of machine learning models, more often than not they still operate as black boxes with no insight into what is happening inside the model.
In this paper, a method that addresses that issue is proposed, with a focus on visualising multi-dimensional time-series data.
Experiments on a high-frequency stock market dataset show that the method provides fast and discernible visualisations.
arXiv Detail & Related papers (2021-03-12T09:53:34Z) - Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability.
We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code.
We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.