Conformal prediction under ambiguous ground truth
- URL: http://arxiv.org/abs/2307.09302v2
- Date: Tue, 24 Oct 2023 10:34:22 GMT
- Title: Conformal prediction under ambiguous ground truth
- Authors: David Stutz, Abhijit Guha Roy, Tatiana Matejovicova, Patricia
Strachan, Ali Taylan Cemgil, Arnaud Doucet
- Abstract summary: Conformal Prediction (CP) allows to perform rigorous uncertainty quantification.
It is typically assumed that $mathbbPY|X$ is the "true" posterior label distribution.
We develop Monte Carlo CP procedures which provide guarantees w.r.t. $mathbbP_vote=mathbbPX otimes mathbbP_voteY|X$.
- Score: 34.92407089333591
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conformal Prediction (CP) allows to perform rigorous uncertainty
quantification by constructing a prediction set $C(X)$ satisfying $\mathbb{P}(Y
\in C(X))\geq 1-\alpha$ for a user-chosen $\alpha \in [0,1]$ by relying on
calibration data $(X_1,Y_1),...,(X_n,Y_n)$ from $\mathbb{P}=\mathbb{P}^{X}
\otimes \mathbb{P}^{Y|X}$. It is typically implicitly assumed that
$\mathbb{P}^{Y|X}$ is the "true" posterior label distribution. However, in many
real-world scenarios, the labels $Y_1,...,Y_n$ are obtained by aggregating
expert opinions using a voting procedure, resulting in a one-hot distribution
$\mathbb{P}_{vote}^{Y|X}$. For such ``voted'' labels, CP guarantees are thus
w.r.t. $\mathbb{P}_{vote}=\mathbb{P}^X \otimes \mathbb{P}_{vote}^{Y|X}$ rather
than the true distribution $\mathbb{P}$. In cases with unambiguous ground truth
labels, the distinction between $\mathbb{P}_{vote}$ and $\mathbb{P}$ is
irrelevant. However, when experts do not agree because of ambiguous labels,
approximating $\mathbb{P}^{Y|X}$ with a one-hot distribution
$\mathbb{P}_{vote}^{Y|X}$ ignores this uncertainty. In this paper, we propose
to leverage expert opinions to approximate $\mathbb{P}^{Y|X}$ using a
non-degenerate distribution $\mathbb{P}_{agg}^{Y|X}$. We develop Monte Carlo CP
procedures which provide guarantees w.r.t. $\mathbb{P}_{agg}=\mathbb{P}^X
\otimes \mathbb{P}_{agg}^{Y|X}$ by sampling multiple synthetic pseudo-labels
from $\mathbb{P}_{agg}^{Y|X}$ for each calibration example $X_1,...,X_n$. In a
case study of skin condition classification with significant disagreement among
expert annotators, we show that applying CP w.r.t. $\mathbb{P}_{vote}$
under-covers expert annotations: calibrated for $72\%$ coverage, it falls short
by on average $10\%$; our Monte Carlo CP closes this gap both empirically and
theoretically.
Related papers
- A class of ternary codes with few weights [0.0]
In this paper, we investigate a ternary code $mathcalC$ of length $n$, defined by $mathcalC$ := (textTr) := (textTr(dx), dots, dots, d_n$.
Using recent results on explicit evaluations of exponential sums, we determine the Weil bound, and techniques, we show that the dual code of $mathcalC$ is optimal with respect to the Hamming bound.
arXiv Detail & Related papers (2024-10-05T16:15:50Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - A random matrix model for random approximate $t$-designs [1.534667887016089]
A random matrix model is proposed to describe the probability distribution of $delta(nu_mathcalS,t)$ for any $t$.
We show that our model satisfies the so-called spectral gap conjecture, i.e. we prove that with the probability $1$ there is $tinmathbbZ_+$ such that $sup satisfieskinmathbbZ_+delta(k)=delta(t)$.
arXiv Detail & Related papers (2022-10-14T14:50:06Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Kernel Thinning [26.25415159542831]
kernel thinning is a new procedure for compressing a distribution $mathbbP$ more effectively than i.i.d. sampling or standard thinning.
We derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat'ern, and B-spline kernels.
arXiv Detail & Related papers (2021-05-12T17:56:42Z) - Near-Optimal Model Discrimination with Non-Disclosure [19.88145627448243]
We first consider the case of a well-specified linear model with squared loss.
We derive sample complexity of a similar form, even under misspecification.
arXiv Detail & Related papers (2020-12-04T23:52:54Z) - An example of prediction which complies with Demographic Parity and
equalizes group-wise risks in the context of regression [3.9596068699962323]
Bayes optimal prediction $f*$ which does not produce Disparate Treatment is defined as $f*(x) = mathbbE[Y | X = x]$.
We discuss several implications of this result on better understanding of mathematical notions of algorithmic fairness.
arXiv Detail & Related papers (2020-11-13T22:46:05Z) - Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals.
Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.