Related papers: Can Explanations Be Useful for Calibrating Black Box Models?

Can Explanations Be Useful for Calibrating Black Box Models?

URL: http://arxiv.org/abs/2110.07586v1
Date: Thu, 14 Oct 2021 17:48:16 GMT
Title: Can Explanations Be Useful for Calibrating Black Box Models?
Authors: Xi Ye and Greg Durrett
Abstract summary: We study how to improve a black box model's performance on a new domain given examples from the new domain. Our approach first extracts a set of features combining human intuition about the task with model attributions. We show that the calibration features transfer to some extent between tasks and shed light on how to effectively use them.
Score: 31.473798197405948
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One often wants to take an existing, trained NLP model and use it on data from a new domain. While fine-tuning or few-shot learning can be used to adapt the base model, there is no one simple recipe to getting these working; moreover, one may not have access to the original model weights if it is deployed as a black box. To this end, we study how to improve a black box model's performance on a new domain given examples from the new domain by leveraging explanations of the model's behavior. Our approach first extracts a set of features combining human intuition about the task with model attributions generated by black box interpretation techniques, and then uses a simple model to calibrate or rerank the model's predictions based on the features. We experiment with our method on two tasks, extractive question answering and natural language inference, covering adaptation from several pairs of domains. The experimental results across all the domain pairs show that explanations are useful for calibrating these models. We show that the calibration features transfer to some extent between tasks and shed light on how to effectively use them.

Related papers

Predicting the Performance of Black-box LLMs through Self-Queries [60.87193950962585]
Large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial. In this paper, we extract features of LLMs in a black-box manner by using follow-up prompts and taking the probabilities of different responses as representations. We demonstrate that training a linear model on these low-dimensional representations produces reliable predictors of model performance at the instance level.
arXiv Detail & Related papers (2025-01-02T22:26:54Z)
DREAM: Domain-agnostic Reverse Engineering Attributes of Black-box Model [50.94236887900527]
We present a new problem of black-box reverse engineering, without requiring the availability of the target model's training dataset. We learn a domain-agnostic meta-model to infer the attributes of the target black-box model with unknown training data.
arXiv Detail & Related papers (2024-12-08T07:37:05Z)
Enabling Small Models for Zero-Shot Classification through Model Label Learning [50.68074833512999]
We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities. Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL.
arXiv Detail & Related papers (2024-08-21T09:08:26Z)
DREAM: Domain-free Reverse Engineering Attributes of Black-box Model [51.37041886352823]
We propose a new problem of Domain-agnostic Reverse Engineering the Attributes of a black-box target model. We learn a domain-agnostic model to infer the attributes of a target black-box model with unknown training data.
arXiv Detail & Related papers (2023-07-20T16:25:58Z)
Increasing Performance And Sample Efficiency With Model-agnostic Interactive Feature Attributions [3.0655581300025996]
We provide model-agnostic implementations for two popular explanation methods (Occlusion and Shapley values) to enforce entirely different attributions in the complex model. We show how our proposed approach can significantly improve the model's performance only by augmenting its training dataset based on corrected explanations.
arXiv Detail & Related papers (2023-06-28T15:23:28Z)
Task-Specific Skill Localization in Fine-tuned Language Models [36.53572616441048]
This paper introduces the term skill localization for this problem. A simple optimization is used to identify a very small subset of parameters. grafting the fine-tuned values for just this tiny subset onto the pre-trained model gives performance almost as well as the fine-tuned model.
arXiv Detail & Related papers (2023-02-13T18:55:52Z)
Symbolic Metamodels for Interpreting Black-boxes Using Primitive Functions [15.727276506140878]
One approach for interpreting black-box machine learning models is to find a global approximation of the model using simple interpretable functions. In this work, we propose a new method for finding interpretable metamodels.
arXiv Detail & Related papers (2023-02-09T17:30:43Z)
Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks [55.431048995662714]
We create a small model for a new task from the pruned models of similar tasks. We show that a few fine-tuning steps on this model suffice to produce a promising pruned-model for the new task. We develop a simple but effective ''Meta-Vote Pruning (MVP)'' method that significantly reduces the pruning iterations for a new task.
arXiv Detail & Related papers (2023-01-27T06:49:47Z)
Synthetic Model Combination: An Instance-wise Approach to Unsupervised Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data. Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z)
Optimizing Active Learning for Low Annotation Budgets [6.753808772846254]
In deep learning, active learning is usually implemented as an iterative process in which successive deep models are updated via fine tuning. We tackle this issue by using an approach inspired by transfer learning. We introduce a novel acquisition function which exploits the iterative nature of AL process to select samples in a more robust fashion.
arXiv Detail & Related papers (2022-01-18T18:53:10Z)
Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models. Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers. We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z)
Design of Dynamic Experiments for Black-Box Model Discrimination [72.2414939419588]
Consider a dynamic model discrimination setting where we wish to chose: (i) what is the best mechanistic, time-varying model and (ii) what are the best model parameter estimates. For rival mechanistic models where we have access to gradient information, we extend existing methods to incorporate a wider range of problem uncertainty. We replace these black-box models with Gaussian process surrogate models and thereby extend the model discrimination setting to additionally incorporate rival black-box model.
arXiv Detail & Related papers (2021-02-07T11:34:39Z)
REST: Performance Improvement of a Black Box Model via RL-based Spatial Transformation [15.691668909002892]
We study robustness to geometric transformations in a specific condition where the black-box image classifier is given. We propose an additional learner, emphREinforcement Spatial Transform (REST), that transforms the warped input data into samples regarded as in-distribution by the black-box models.
arXiv Detail & Related papers (2020-02-16T16:15:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.