Interpretable by Design: Wrapper Boxes Combine Neural Performance with Faithful Attribution of Model Decisions to Training Data
- URL: http://arxiv.org/abs/2311.08644v2
- Date: Wed, 17 Jul 2024 19:30:42 GMT
- Title: Interpretable by Design: Wrapper Boxes Combine Neural Performance with Faithful Attribution of Model Decisions to Training Data
- Authors: Yiheng Su, Junyi Jessy Li, Matthew Lease,
- Abstract summary: We present wrapper boxes, a general approach to generate faithful, example-based explanations for model predictions.
After training a neural model as usual, its learned feature representation is input to a classic, interpretable model to perform the actual prediction.
- Score: 40.7542543934205
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Can we preserve the accuracy of neural models while also providing faithful explanations? We present wrapper boxes, a general approach to generate faithful, example-based explanations for model predictions while maintaining predictive performance. After training a neural model as usual, its learned feature representation is input to a classic, interpretable model to perform the actual prediction. This simple strategy is surprisingly effective, with results largely comparable to those of the original neural model, as shown across three large pre-trained language models, two datasets of varying scale, four classic models, and four evaluation metrics. Moreover, because these classic models are interpretable by design, the subset of training examples that determine classic model predictions can be shown directly to users.
Related papers
- Approximating Language Model Training Data from Weights [70.08614275061689]
We formalize the problem of data approximation from model weights and propose several baselines and metrics.<n>We develop a gradient-based approach that selects the highest-matching data from a large public text corpus.<n>Even when none of the true training data is known, our method is able to locate a small subset of public Web documents.
arXiv Detail & Related papers (2025-06-18T15:26:43Z) - A Two-Phase Recall-and-Select Framework for Fast Model Selection [13.385915962994806]
We propose a two-phase (coarse-recall and fine-selection) model selection framework.
It aims to enhance the efficiency of selecting a robust model by leveraging the models' training performances on benchmark datasets.
It has been demonstrated that the proposed methodology facilitates the selection of a high-performing model at a rate about 3x times faster than conventional baseline methods.
arXiv Detail & Related papers (2024-03-28T14:44:44Z) - Budgeted Online Model Selection and Fine-Tuning via Federated Learning [26.823435733330705]
Online model selection involves selecting a model from a set of candidate models 'on the fly' to perform prediction on a stream of data.
The choice of candidate models henceforth has a crucial impact on the performance.
The present paper proposes an online federated model selection framework where a group of learners (clients) interacts with a server with sufficient memory.
Using the proposed algorithm, clients and the server collaborate to fine-tune models to adapt them to a non-stationary environment.
arXiv Detail & Related papers (2024-01-19T04:02:49Z) - TRAK: Attributing Model Behavior at Scale [79.56020040993947]
We present TRAK (Tracing with Randomly-trained After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differenti models.
arXiv Detail & Related papers (2023-03-24T17:56:22Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Revealing Secrets From Pre-trained Models [2.0249686991196123]
Transfer-learning has been widely adopted in many emerging deep learning algorithms.
We show that pre-trained models and fine-tuned models have significantly high similarities in weight values.
We propose a new model extraction attack that reveals the model architecture and the pre-trained model used by the black-box victim model.
arXiv Detail & Related papers (2022-07-19T20:19:03Z) - Model Selection, Adaptation, and Combination for Deep Transfer Learning
through Neural Networks in Renewable Energies [5.953831950062808]
We conduct the first thorough experiment for model selection and adaptation for transfer learning in renewable power forecast.
We adopt models based on data from different seasons and limit the amount of training data.
We show how combining multiple models through ensembles can significantly improve the model selection and adaptation approach.
arXiv Detail & Related papers (2022-04-28T05:34:50Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - ModelDiff: Testing-Based DNN Similarity Comparison for Model Reuse
Detection [9.106864924968251]
ModelDiff is a testing-based approach to deep learning model similarity comparison.
A study on mobile deep learning apps has shown the feasibility of ModelDiff on real-world models.
arXiv Detail & Related papers (2021-06-11T15:16:18Z) - A linearized framework and a new benchmark for model selection for
fine-tuning [112.20527122513668]
Fine-tuning from a collection of models pre-trained on different domains is emerging as a technique to improve test accuracy in the low-data regime.
We introduce two new baselines for model selection -- Label-Gradient and Label-Feature Correlation.
Our benchmark highlights accuracy gain with model zoo compared to fine-tuning Imagenet models.
arXiv Detail & Related papers (2021-01-29T21:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.