Related papers: Conformal prediction under ambiguous ground truth

Conformal prediction under ambiguous ground truth

URL: http://arxiv.org/abs/2307.09302v2
Date: Tue, 24 Oct 2023 10:34:22 GMT
Title: Conformal prediction under ambiguous ground truth
Authors: David Stutz, Abhijit Guha Roy, Tatiana Matejovicova, Patricia Strachan, Ali Taylan Cemgil, Arnaud Doucet
Abstract summary: Conformal Prediction (CP) allows to perform rigorous uncertainty quantification. It is typically assumed that $mathbbPY|X$ is the "true" posterior label distribution. We develop Monte Carlo CP procedures which provide guarantees w.r.t. $mathbbP_vote=mathbbPX otimes mathbbP_voteY|X$.
Score: 34.92407089333591
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conformal Prediction (CP) allows to perform rigorous uncertainty quantification by constructing a prediction set $C(X)$ satisfying $\mathbb{P}(Y \in C(X))\geq 1-\alpha$ for a user-chosen $\alpha \in [0,1]$ by relying on calibration data $(X_1,Y_1),...,(X_n,Y_n)$ from $\mathbb{P}=\mathbb{P}^{X} \otimes \mathbb{P}^{Y|X}$. It is typically implicitly assumed that $\mathbb{P}^{Y|X}$ is the "true" posterior label distribution. However, in many real-world scenarios, the labels $Y_1,...,Y_n$ are obtained by aggregating expert opinions using a voting procedure, resulting in a one-hot distribution $\mathbb{P}_{vote}^{Y|X}$. For such ``voted'' labels, CP guarantees are thus w.r.t. $\mathbb{P}_{vote}=\mathbb{P}^X \otimes \mathbb{P}_{vote}^{Y|X}$ rather than the true distribution $\mathbb{P}$. In cases with unambiguous ground truth labels, the distinction between $\mathbb{P}_{vote}$ and $\mathbb{P}$ is irrelevant. However, when experts do not agree because of ambiguous labels, approximating $\mathbb{P}^{Y|X}$ with a one-hot distribution $\mathbb{P}_{vote}^{Y|X}$ ignores this uncertainty. In this paper, we propose to leverage expert opinions to approximate $\mathbb{P}^{Y|X}$ using a non-degenerate distribution $\mathbb{P}_{agg}^{Y|X}$. We develop Monte Carlo CP procedures which provide guarantees w.r.t. $\mathbb{P}_{agg}=\mathbb{P}^X \otimes \mathbb{P}_{agg}^{Y|X}$ by sampling multiple synthetic pseudo-labels from $\mathbb{P}_{agg}^{Y|X}$ for each calibration example $X_1,...,X_n$. In a case study of skin condition classification with significant disagreement among expert annotators, we show that applying CP w.r.t. $\mathbb{P}_{vote}$ under-covers expert annotations: calibrated for $72\%$ coverage, it falls short by on average $10\%$; our Monte Carlo CP closes this gap both empirically and theoretically.

Related papers

Weight distribution of a class of $p$-ary codes [0.0]
We prove all possible weights of codewords in $mathcalC_alpha,beta,beta$, demonstrating that it has at most $p+1$ distinct nonzero weights. We also prove that the dual code $mathcalC_alpha,beta$ is optimal with respect to the sphere packing bound.
arXiv Detail & Related papers (2025-03-24T20:53:04Z)
A class of ternary codes with few weights [0.0]
In this paper, we investigate a ternary code $mathcalC$ of length $n$, defined by $mathcalC$ := (textTr) := (textTr(dx), dots, dots, d_n$. Using recent results on explicit evaluations of exponential sums, we determine the Weil bound, and techniques, we show that the dual code of $mathcalC$ is optimal with respect to the Hamming bound.
arXiv Detail & Related papers (2024-10-05T16:15:50Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z)
A random matrix model for random approximate $t$-designs [1.534667887016089]
A random matrix model is proposed to describe the probability distribution of $delta(nu_mathcalS,t)$ for any $t$. We show that our model satisfies the so-called spectral gap conjecture, i.e. we prove that with the probability $1$ there is $tinmathbbZ_+$ such that $sup satisfieskinmathbbZ_+delta(k)=delta(t)$.
arXiv Detail & Related papers (2022-10-14T14:50:06Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$. We propose a novel random technique, called Ternary Random Feature (TRF) The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z)
Spectral properties of sample covariance matrices arising from random matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$. Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z)
Kernel Thinning [26.25415159542831]
kernel thinning is a new procedure for compressing a distribution $mathbbP$ more effectively than i.i.d. sampling or standard thinning. We derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat'ern, and B-spline kernels.
arXiv Detail & Related papers (2021-05-12T17:56:42Z)
Near-Optimal Model Discrimination with Non-Disclosure [19.88145627448243]
We first consider the case of a well-specified linear model with squared loss. We derive sample complexity of a similar form, even under misspecification.
arXiv Detail & Related papers (2020-12-04T23:52:54Z)
An example of prediction which complies with Demographic Parity and equalizes group-wise risks in the context of regression [3.9596068699962323]
Bayes optimal prediction $f*$ which does not produce Disparate Treatment is defined as $f*(x) = mathbbE[Y | X = x]$. We discuss several implications of this result on better understanding of mathematical notions of algorithmic fairness.
arXiv Detail & Related papers (2020-11-13T22:46:05Z)
Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals. Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.