On Orderings of Probability Vectors and Unsupervised Performance
Estimation
- URL: http://arxiv.org/abs/2306.10160v1
- Date: Fri, 16 Jun 2023 20:03:16 GMT
- Title: On Orderings of Probability Vectors and Unsupervised Performance
Estimation
- Authors: Muhammad Maaz, Rui Qiao, Yiheng Zhou, Renxian Zhang
- Abstract summary: We show that the $Linfty$ norm is the most appropriate score function for classification problems.
We conduct experiments on well-known NLP data sets and rigorously explore the performance of different score functions.
- Score: 6.2163687973613495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised performance estimation, or evaluating how well models perform on
unlabeled data is a difficult task. Recently, a method was proposed by Garg et
al. [2022] which performs much better than previous methods. Their method
relies on having a score function, satisfying certain properties, to map
probability vectors outputted by the classifier to the reals, but it is an open
problem which score function is best. We explore this problem by first showing
that their method fundamentally relies on the ordering induced by this score
function. Thus, under monotone transformations of score functions, their method
yields the same estimate. Next, we show that in the binary classification
setting, nearly all common score functions - the $L^\infty$ norm; the $L^2$
norm; negative entropy; and the $L^2$, $L^1$, and Jensen-Shannon distances to
the uniform vector - all induce the same ordering over probability vectors.
However, this does not hold for higher dimensional settings. We conduct
numerous experiments on well-known NLP data sets and rigorously explore the
performance of different score functions. We conclude that the $L^\infty$ norm
is the most appropriate.
Related papers
- Faster Algorithms and Constant Lower Bounds for the Worst-Case Expected
Error [0.3997680012976965]
The goal is to design estimators that minimize the worst-case expected error.
Chen, Valiant and Valiant show that, when data values are $ell_infty$-normalized, there is a time algorithm to compute an estimator for the mean.
In this paper we design provably efficient algorithms for approximating the optimal semilinear estimator based on online convex optimization.
arXiv Detail & Related papers (2021-12-27T18:47:25Z) - Distributionally Robust Optimization with Markovian Data [8.126833795693699]
We study a program where the probability distribution of the uncertain problem parameters is unknown.
We propose a data-driven distributionally to estimate the problem's objective function and optimal solution.
arXiv Detail & Related papers (2021-06-12T10:59:02Z) - Near-Optimal High Probability Complexity Bounds for Non-Smooth
Stochastic Optimization with Heavy-Tailed Noise [63.304196997102494]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity with bounds dependence on the confidence level that is either negative-power or logarithmic.
We propose novel stepsize rules for two gradient methods with clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Higher-Order Orthogonal Causal Learning for Treatment Effect [15.652550362252205]
We present an algorithm that enables us to obtain the debiased estimator recovered from the score function.
We also undergo comprehensive experiments to test the power of the estimator we construct from the score function using both the simulated datasets and the real datasets.
arXiv Detail & Related papers (2021-03-22T14:04:13Z) - Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations.
In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z) - Hutch++: Optimal Stochastic Trace Estimation [75.45968495410048]
We introduce a new randomized algorithm, Hutch++, which computes a $(1 pm epsilon)$ approximation to $tr(A)$ for any positive semidefinite (PSD) $A$.
We show that it significantly outperforms Hutchinson's method in experiments.
arXiv Detail & Related papers (2020-10-19T16:45:37Z) - A One-bit, Comparison-Based Gradient Estimator [29.600975900977343]
We show how one can use tools from one-bit compressed sensing to construct a robust and reliable estimator of the normalized gradient.
We propose an algorithm, coined SCOBO, that uses this estimator within a gradient descent scheme.
arXiv Detail & Related papers (2020-10-06T05:01:38Z) - Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data.
We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z) - Debiasing Distributed Second Order Optimization with Surrogate Sketching
and Scaled Regularization [101.5159744660701]
In distributed second order optimization, a standard strategy is to average many local estimates, each of which is based on a small sketch or batch of the data.
Here, we introduce a new technique for debiasing the local estimates, which leads to both theoretical and empirical improvements in the convergence rate of distributed second order methods.
arXiv Detail & Related papers (2020-07-02T18:08:14Z) - Efficient estimation of the ANOVA mean dimension, with an application to
neural net classification [0.0]
Mean dimension of a black box function of $d$ variables is written as the sum of $d$ Sobol' indices that can be estimated by leave one out methods.
We compare the variance of these leave one out methods: a Gibbs sampler called winding stairs, a radial sampler that changes each variable one at a time from a baseline, and a naive sampler that never reuses function evaluations.
arXiv Detail & Related papers (2020-07-02T17:44:10Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.