Gaussian Process Probes (GPP) for Uncertainty-Aware Probing
- URL: http://arxiv.org/abs/2305.18213v2
- Date: Mon, 6 Nov 2023 13:08:28 GMT
- Title: Gaussian Process Probes (GPP) for Uncertainty-Aware Probing
- Authors: Zi Wang and Alexander Ku and Jason Baldridge and Thomas L. Griffiths
and Been Kim
- Abstract summary: We introduce a unified and simple framework for probing and measuring uncertainty about concepts represented by models.
Our experiments show it can (1) probe a model's representations of concepts even with a very small number of examples, (2) accurately measure both epistemic uncertainty (how confident the probe is) and aleatory uncertainty (how fuzzy the concepts are to the model), and (3) detect out of distribution data using those uncertainty measures as well as classic methods do.
- Score: 61.91898698128994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding which concepts models can and cannot represent has been
fundamental to many tasks: from effective and responsible use of models to
detecting out of distribution data. We introduce Gaussian process probes (GPP),
a unified and simple framework for probing and measuring uncertainty about
concepts represented by models. As a Bayesian extension of linear probing
methods, GPP asks what kind of distribution over classifiers (of concepts) is
induced by the model. This distribution can be used to measure both what the
model represents and how confident the probe is about what the model
represents. GPP can be applied to any pre-trained model with vector
representations of inputs (e.g., activations). It does not require access to
training data, gradients, or the architecture. We validate GPP on datasets
containing both synthetic and real images. Our experiments show it can (1)
probe a model's representations of concepts even with a very small number of
examples, (2) accurately measure both epistemic uncertainty (how confident the
probe is) and aleatory uncertainty (how fuzzy the concepts are to the model),
and (3) detect out of distribution data using those uncertainty measures as
well as classic methods do. By using Gaussian processes to expand what probing
can offer, GPP provides a data-efficient, versatile and uncertainty-aware tool
for understanding and evaluating the capabilities of machine learning models.
Related papers
- Explainable Learning with Gaussian Processes [23.796560256071473]
We take a principled approach to defining attributions under model uncertainty, extending the existing literature.
We show that although GPR is a highly flexible and non-parametric approach, we can derive interpretable, closed-form expressions for the feature attributions.
We also show that, when applicable, the exact expressions for GPR attributions are both more accurate and less computationally expensive than the approximations currently used in practice.
arXiv Detail & Related papers (2024-03-11T18:03:02Z) - Diffusion models for probabilistic programming [56.47577824219207]
Diffusion Model Variational Inference (DMVI) is a novel method for automated approximate inference in probabilistic programming languages (PPLs)
DMVI is easy to implement, allows hassle-free inference in PPLs without the drawbacks of, e.g., variational inference using normalizing flows, and does not make any constraints on the underlying neural network model.
arXiv Detail & Related papers (2023-11-01T12:17:05Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - PSD Representations for Effective Probability Models [117.35298398434628]
We show that a recently proposed class of positive semi-definite (PSD) models for non-negative functions is particularly suited to this end.
We characterize both approximation and generalization capabilities of PSD models, showing that they enjoy strong theoretical guarantees.
Our results open the way to applications of PSD models to density estimation, decision theory and inference.
arXiv Detail & Related papers (2021-06-30T15:13:39Z) - Incorporating Causal Graphical Prior Knowledge into Predictive Modeling
via Simple Data Augmentation [92.96204497841032]
Causal graphs (CGs) are compact representations of the knowledge of the data generating processes behind the data distributions.
We propose a model-agnostic data augmentation method that allows us to exploit the prior knowledge of the conditional independence (CI) relations.
We experimentally show that the proposed method is effective in improving the prediction accuracy, especially in the small-data regime.
arXiv Detail & Related papers (2021-02-27T06:13:59Z) - Gaussian Function On Response Surface Estimation [12.35564140065216]
We propose a new framework for interpreting (features and samples) black-box machine learning models via a metamodeling technique.
The metamodel can be estimated from data generated via a trained complex model by running the computer experiment on samples of data in the region of interest.
arXiv Detail & Related papers (2021-01-04T04:47:00Z) - Gaussian Process Regression with Local Explanation [28.90948136731314]
We propose GPR with local explanation, which reveals the feature contributions to the prediction of each sample.
In the proposed model, both the prediction and explanation for each sample are performed using an easy-to-interpret locally linear model.
For a new test sample, the proposed model can predict the values of its target variable and weight vector, as well as their uncertainties.
arXiv Detail & Related papers (2020-07-03T13:22:24Z) - Query Training: Learning a Worse Model to Infer Better Marginals in
Undirected Graphical Models with Hidden Variables [11.985433487639403]
Probabilistic graphical models (PGMs) provide a compact representation of knowledge that can be queried in a flexible way.
We introduce query training (QT), a mechanism to learn a PGM that is optimized for the approximate inference algorithm that will be paired with it.
We demonstrate experimentally that QT can be used to learn a challenging 8-connected grid Markov random field with hidden variables.
arXiv Detail & Related papers (2020-06-11T20:34:32Z) - Characterizing and Avoiding Problematic Global Optima of Variational
Autoencoders [28.36260646471421]
Variational Auto-encoders (VAEs) are deep generative latent variable models.
Recent work shows that traditional training methods tend to yield solutions that violate desiderata.
We show that both issues stem from the fact that the global optima of the VAE training objective often correspond to undesirable solutions.
arXiv Detail & Related papers (2020-03-17T15:14:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.