Evaluating Probabilistic Inference in Deep Learning: Beyond Marginal
Predictions
- URL: http://arxiv.org/abs/2107.09224v1
- Date: Tue, 20 Jul 2021 01:55:01 GMT
- Title: Evaluating Probabilistic Inference in Deep Learning: Beyond Marginal
Predictions
- Authors: Xiuyuan Lu, Ian Osband, Benjamin Van Roy, Zheng Wen
- Abstract summary: We show that the commonly-used $tau=1$ can be insufficient to drive good decisions in many settings of interest.
We also show that, as $tau$ grows, performing well according to $mathbfd_mathrmKLtau$ recovers universal guarantees for any possible decision.
- Score: 37.92747047688873
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A fundamental challenge for any intelligent system is prediction: given some
inputs $X_1,..,X_\tau$ can you predict outcomes $Y_1,.., Y_\tau$. The KL
divergence $\mathbf{d}_{\mathrm{KL}}$ provides a natural measure of prediction
quality, but the majority of deep learning research looks only at the marginal
predictions per input $X_t$. In this technical report we propose a scoring rule
$\mathbf{d}_{\mathrm{KL}}^\tau$, parameterized by $\tau \in \mathcal{N}$ that
evaluates the joint predictions at $\tau$ inputs simultaneously. We show that
the commonly-used $\tau=1$ can be insufficient to drive good decisions in many
settings of interest. We also show that, as $\tau$ grows, performing well
according to $\mathbf{d}_{\mathrm{KL}}^\tau$ recovers universal guarantees for
any possible decision. Finally, we provide problem-dependent guidance on the
scale of $\tau$ for which our score provides sufficient guarantees for good
performance.
Related papers
- Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation.
We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z) - Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits [4.811176167998627]
We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution.
Our goal is to efficiently select a single high quality arm whose average reward is, with probability $1-delta$, within $varepsilon$ of being among the top $eta$-fraction of arms.
arXiv Detail & Related papers (2023-06-03T04:00:47Z) - How Many and Which Training Points Would Need to be Removed to Flip this
Prediction? [34.9118528281516]
We consider the problem of identifying a minimal subset of training data $mathcalS_t$.
If the instances comprising $mathcalS_t$ had been removed prior to training, the categorization of a given test point $x_t$ would have been different.
We propose comparatively fast approximation methods to find $mathcalS_t$ based on influence functions.
arXiv Detail & Related papers (2023-02-04T13:55:12Z) - Resampling Sensitivity of High-Dimensional PCA [7.436169208279454]
We study the resampling sensitivity for the principal component analysis (PCA)
We show that PCA is sensitive to the input data in a negligible sense that resampling may completely change the output.
arXiv Detail & Related papers (2022-12-30T03:13:04Z) - Mediated Uncoupled Learning: Learning Functions without Direct
Input-output Correspondences [80.95776331769899]
We consider the task of predicting $Y$ from $X$ when we have no paired data of them.
A naive approach is to predict $U$ from $X$ using $S_X$ and then $Y$ from $U$ using $S_Y$.
We propose a new method that avoids predicting $U$ but directly learns $Y = f(X)$ by training $f(X)$ with $S_X$ to predict $h(U)$.
arXiv Detail & Related papers (2021-07-16T22:13:29Z) - Optimal Spectral Recovery of a Planted Vector in a Subspace [80.02218763267992]
We study efficient estimation and detection of a planted vector $v$ whose $ell_4$ norm differs from that of a Gaussian vector with the same $ell$ norm.
We show that in the regime $n rho gg sqrtN$, any spectral method from a large class (and more generally, any low-degree of the input) fails to detect the planted vector.
arXiv Detail & Related papers (2021-05-31T16:10:49Z) - Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals.
Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.