Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective
- URL: http://arxiv.org/abs/2412.06033v1
- Date: Sun, 08 Dec 2024 19:03:21 GMT
- Title: Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective
- Authors: Andrew Jesson, Nicolas Beltran-Velez, David Blei,
- Abstract summary: We show when ancestral sampling from the predictive distribution of a CGM is equivalent to sampling datasets from the posterior predictive of the assumed Bayesian model.<n>The generative predictive $p$-value can then be used in a statistical decision procedure to determine when the model is appropriate for an ICL problem.
- Score: 3.759959474986743
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work is about estimating when a conditional generative model (CGM) can solve an in-context learning (ICL) problem. An in-context learning (ICL) problem comprises a CGM, a dataset, and a prediction task. The CGM could be a multi-modal foundation model; the dataset, a collection of patient histories, test results, and recorded diagnoses; and the prediction task to communicate a diagnosis to a new patient. A Bayesian interpretation of ICL assumes that the CGM computes a posterior predictive distribution over an unknown Bayesian model defining a joint distribution over latent explanations and observable data. From this perspective, Bayesian model criticism is a reasonable approach to assess the suitability of a given CGM for an ICL problem. However, such approaches -- like posterior predictive checks (PPCs) -- often assume that we can sample from the likelihood and posterior defined by the Bayesian model, which are not explicitly given for contemporary CGMs. To address this, we show when ancestral sampling from the predictive distribution of a CGM is equivalent to sampling datasets from the posterior predictive of the assumed Bayesian model. Then we develop the generative predictive $p$-value, which enables PPCs and their cousins for contemporary CGMs. The generative predictive $p$-value can then be used in a statistical decision procedure to determine when the model is appropriate for an ICL problem. Our method only requires generating queries and responses from a CGM and evaluating its response log probability. We empirically evaluate our method on synthetic tabular, imaging, and natural language ICL tasks using large language models.
Related papers
- Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.
The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.
The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z) - Considerations for Distribution Shift Robustness of Diagnostic Models in Healthcare [10.393967785465536]
In the domain of applied ML for health, it is common to predict $Y$ from $X$ without considering further information about the patient.
In this work, we highlight a data generating mechanism common to healthcare settings and discuss how recent theoretical results from the causality literature can be applied to build robust predictive models.
arXiv Detail & Related papers (2024-10-25T14:13:09Z) - Estimating the Hallucination Rate of Generative AI [44.854771627716225]
We present a method for estimating the hallucination rate for in-context learning with generative AI.<n>In ICL, a conditional generative model (CGM) is prompted with a dataset and a prediction question and asked to generate a response.<n>We develop a new method that takes an ICL problem and estimates the probability that a CGM will generate a hallucination.
arXiv Detail & Related papers (2024-06-11T17:01:52Z) - Calibrating Neural Simulation-Based Inference with Differentiable
Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model.
By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation.
It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z) - Inductive Conformal Prediction: A Straightforward Introduction with
Examples in Python [0.0]
Inductive Conformal Prediction (ICP) is a set of distribution-free and model agnostic algorithms devised to predict with a user-defined confidence with coverage guarantee.
ICP takes special importance in high-risk settings where we want the true output to belong to the prediction set with high probability.
This paper is a hands-on introduction, this means that we will provide examples as we introduce the theory.
arXiv Detail & Related papers (2022-06-23T16:35:43Z) - Multi-modality fusion using canonical correlation analysis methods:
Application in breast cancer survival prediction from histology and genomics [16.537929113715432]
We study the use of canonical correlation analysis (CCA) and penalized variants of CCA for the fusion of two modalities.
We analytically show that, with known model parameters, posterior mean estimators that jointly use both modalities outperform arbitrary linear mixing of single modality posterior estimators in latent variable prediction.
arXiv Detail & Related papers (2021-11-27T21:18:01Z) - Inverting brain grey matter models with likelihood-free inference: a
tool for trustable cytoarchitecture measurements [62.997667081978825]
characterisation of the brain grey matter cytoarchitecture with quantitative sensitivity to soma density and volume remains an unsolved challenge in dMRI.
We propose a new forward model, specifically a new system of equations, requiring a few relatively sparse b-shells.
We then apply modern tools from Bayesian analysis known as likelihood-free inference (LFI) to invert our proposed model.
arXiv Detail & Related papers (2021-11-15T09:08:27Z) - Approximate Bayesian Computation for an Explicit-Duration Hidden Markov
Model of COVID-19 Hospital Trajectories [55.786207368853084]
We address the problem of modeling constrained hospital resources in the midst of the COVID-19 pandemic.
For broad applicability, we focus on the common yet challenging scenario where patient-level data for a region of interest are not available.
We propose an aggregate count explicit-duration hidden Markov model, nicknamed the ACED-HMM, with an interpretable, compact parameterization.
arXiv Detail & Related papers (2021-04-28T15:32:42Z) - Continual Learning with Fully Probabilistic Models [70.3497683558609]
We present an approach for continual learning based on fully probabilistic (or generative) models of machine learning.
We propose a pseudo-rehearsal approach using a Gaussian Mixture Model (GMM) instance for both generator and classifier functionalities.
We show that GMR achieves state-of-the-art performance on common class-incremental learning problems at very competitive time and memory complexity.
arXiv Detail & Related papers (2021-04-19T12:26:26Z) - Inference in Stochastic Epidemic Models via Multinomial Approximations [2.28438857884398]
We introduce a new method for inference in epidemic models.
The method is applicable to a class of discrete-time, finite-population compartmental models.
We show how the method can be embedded within a Sequential Monte Carlo approach to estimating the time-varying reproduction number of COVID-19 in Wuhan, China.
arXiv Detail & Related papers (2020-06-24T13:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.