Related papers: On Orderings of Probability Vectors and Unsupervised Performance Estimation

On Orderings of Probability Vectors and Unsupervised Performance Estimation

URL: http://arxiv.org/abs/2306.10160v1
Date: Fri, 16 Jun 2023 20:03:16 GMT
Title: On Orderings of Probability Vectors and Unsupervised Performance Estimation
Authors: Muhammad Maaz, Rui Qiao, Yiheng Zhou, Renxian Zhang
Abstract summary: We show that the $Linfty$ norm is the most appropriate score function for classification problems. We conduct experiments on well-known NLP data sets and rigorously explore the performance of different score functions.
Score: 6.2163687973613495
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unsupervised performance estimation, or evaluating how well models perform on unlabeled data is a difficult task. Recently, a method was proposed by Garg et al. [2022] which performs much better than previous methods. Their method relies on having a score function, satisfying certain properties, to map probability vectors outputted by the classifier to the reals, but it is an open problem which score function is best. We explore this problem by first showing that their method fundamentally relies on the ordering induced by this score function. Thus, under monotone transformations of score functions, their method yields the same estimate. Next, we show that in the binary classification setting, nearly all common score functions - the $L^\infty$ norm; the $L^2$ norm; negative entropy; and the $L^2$, $L^1$, and Jensen-Shannon distances to the uniform vector - all induce the same ordering over probability vectors. However, this does not hold for higher dimensional settings. We conduct numerous experiments on well-known NLP data sets and rigorously explore the performance of different score functions. We conclude that the $L^\infty$ norm is the most appropriate.

Related papers

Entangled Mean Estimation in High-Dimensions [36.97113089188035]
We study the task of high-dimensional entangled mean estimation in the subset-of-signals model. We show that the optimal error (up to polylogarithmic factors) is $f(alpha,N) + sqrtD/(alpha N)$, where the term $f(alpha,N)$ is the error of the one-dimensional problem and the second term is the sub-Gaussian error rate.
arXiv Detail & Related papers (2025-01-09T18:31:35Z)
Pathwise optimization for bridge-type estimators and its applications [49.1574468325115]
Pathwise methods allow to efficiently compute the full path for penalized estimators. We apply these algorithms to the penalized estimation of processes observed at discrete times.
arXiv Detail & Related papers (2024-12-05T10:38:29Z)
S-CFE: Simple Counterfactual Explanations [21.975560789792073]
We tackle the problem of finding manifold-aligned counterfactual explanations for sparse data. Our approach effectively produces sparse, manifold-aligned counterfactual explanations.
arXiv Detail & Related papers (2024-10-21T07:42:43Z)
Faster Algorithms and Constant Lower Bounds for the Worst-Case Expected Error [0.3997680012976965]
The goal is to design estimators that minimize the worst-case expected error. Chen, Valiant and Valiant show that, when data values are $ell_infty$-normalized, there is a time algorithm to compute an estimator for the mean. In this paper we design provably efficient algorithms for approximating the optimal semilinear estimator based on online convex optimization.
arXiv Detail & Related papers (2021-12-27T18:47:25Z)
Distributionally Robust Optimization with Markovian Data [8.126833795693699]
We study a program where the probability distribution of the uncertain problem parameters is unknown. We propose a data-driven distributionally to estimate the problem's objective function and optimal solution.
arXiv Detail & Related papers (2021-06-12T10:59:02Z)
Higher-Order Orthogonal Causal Learning for Treatment Effect [15.652550362252205]
We present an algorithm that enables us to obtain the debiased estimator recovered from the score function. We also undergo comprehensive experiments to test the power of the estimator we construct from the score function using both the simulated datasets and the real datasets.
arXiv Detail & Related papers (2021-03-22T14:04:13Z)
Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations. In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z)
Hutch++: Optimal Stochastic Trace Estimation [75.45968495410048]
We introduce a new randomized algorithm, Hutch++, which computes a $(1 pm epsilon)$ approximation to $tr(A)$ for any positive semidefinite (PSD) $A$. We show that it significantly outperforms Hutchinson's method in experiments.
arXiv Detail & Related papers (2020-10-19T16:45:37Z)
A One-bit, Comparison-Based Gradient Estimator [29.600975900977343]
We show how one can use tools from one-bit compressed sensing to construct a robust and reliable estimator of the normalized gradient. We propose an algorithm, coined SCOBO, that uses this estimator within a gradient descent scheme.
arXiv Detail & Related papers (2020-10-06T05:01:38Z)
Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data. We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z)
Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization [101.5159744660701]
In distributed second order optimization, a standard strategy is to average many local estimates, each of which is based on a small sketch or batch of the data. Here, we introduce a new technique for debiasing the local estimates, which leads to both theoretical and empirical improvements in the convergence rate of distributed second order methods.
arXiv Detail & Related papers (2020-07-02T18:08:14Z)
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function. We consider a randomized approximation of the projected gradient descent algorithm. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)
Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers. We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.