Related papers: What Is the Optimal Ranking Score Between Precision and Recall? We Can Always Find It and It Is Rarely $F

What Is the Optimal Ranking Score Between Precision and Recall? We Can Always Find It and It Is Rarely $F_1$

URL: http://arxiv.org/abs/2511.22442v1
Date: Thu, 27 Nov 2025 13:29:50 GMT
Title: What Is the Optimal Ranking Score Between Precision and Recall? We Can Always Find It and It Is Rarely $F_1$
Authors: Sébastien Piérard, Adrien Deliège, Marc Van Droogenbroeck,
Abstract summary: We establish that $F_$-induced rankings are meaningful and define a shortest path between precision- and recall-induced rankings.<n>We provide theoretical tools and a closed-form expression to find the optimal value for $$ for any distribution or set of performances.
Score: 17.215680052668244
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ranking methods or models based on their performance is of prime importance but is tricky because performance is fundamentally multidimensional. In the case of classification, precision and recall are scores with probabilistic interpretations that are both important to consider and complementary. The rankings induced by these two scores are often in partial contradiction. In practice, therefore, it is extremely useful to establish a compromise between the two views to obtain a single, global ranking. Over the last fifty years or so,it has been proposed to take a weighted harmonic mean, known as the F-score, F-measure, or $F_β$. Generally speaking, by averaging basic scores, we obtain a score that is intermediate in terms of values. However, there is no guarantee that these scores lead to meaningful rankings and no guarantee that the rankings are good tradeoffs between these base scores. Given the ubiquity of $F_β$ scores in the literature, some clarification is in order. Concretely: (1) We establish that $F_β$-induced rankings are meaningful and define a shortest path between precision- and recall-induced rankings. (2) We frame the problem of finding a tradeoff between two scores as an optimization problem expressed with Kendall rank correlations. We show that $F_1$ and its skew-insensitive version are far from being optimal in that regard. (3) We provide theoretical tools and a closed-form expression to find the optimal value for $β$ for any distribution or set of performances, and we illustrate their use on six case studies.

Related papers

Reference-Free Rating of LLM Responses via Latent Information [53.463883683503106]
We study the common practice of asking a judge model to assign Likert-scale scores to free-text responses.<n>We then propose and evaluate Latent Judges, which derive scalar ratings from internal model signals.<n>Across a broad suite of pairwise and single-rating benchmarks, latent methods match or surpass standard prompting.
arXiv Detail & Related papers (2025-09-29T12:15:52Z)
Standardization of Weighted Ranking Correlation Coefficients [0.06526824510982801]
A relevant problem in statistics is defining the correlation of two rankings of a list of items.<n>We propose a standardization function $g(x)$ that maps a correlation ranking coefficient $Gamma$ in a standard form $g(Gamma)$ that has zero expected value.
arXiv Detail & Related papers (2025-04-11T10:37:19Z)
Transductive Conformal Inference for Full Ranking [8.050897403457995]
We introduce a method based on Conformal Prediction (CP) to quantify the uncertainty of full ranking algorithms.<n>We focus on a specific scenario where $n+m$ items are to be ranked by some black box'' algorithm.
arXiv Detail & Related papers (2025-01-20T10:24:33Z)
From Pairwise to Ranking: Climbing the Ladder to Ideal Collaborative Filtering with Pseudo-Ranking [13.01752267289297]
An ideal collaborative filtering model should learn from users' full rankings over all items to make optimal top-K recommendations.<n>Most CF models rely on pairwise loss functions to approximate full rankings, resulting in an immense performance gap.<n>We propose a pseudo-ranking paradigm (PRP) that addresses the lack of ranking information by introducing pseudo-rankings supervised by an original noise injection mechanism.
arXiv Detail & Related papers (2024-12-24T05:01:16Z)
Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation. We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z)
Improved theoretical guarantee for rank aggregation via spectral method [1.0152838128195467]
Given pairwise comparisons between multiple items, how to rank them so that the ranking matches the observations? This problem, known as rank aggregation, has found many applications in sports, recommendation systems, and other web applications. Here, each pairwise comparison is a corrupted copy of the true score difference.
arXiv Detail & Related papers (2023-09-07T16:01:47Z)
Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment [54.179859639868646]
We propose a model agnostic post-processing framework xOrder for achieving fairness in bipartite ranking. xOrder is compatible with various classification models and ranking fairness metrics, including supervised and unsupervised fairness metrics. We evaluate our proposed algorithm on four benchmark data sets and two real-world patient electronic health record repositories.
arXiv Detail & Related papers (2023-07-27T07:42:44Z)
Optimizing Partial Area Under the Top-k Curve: Theory and Practice [151.5072746015253]
We develop a novel metric named partial Area Under the top-k Curve (AUTKC) AUTKC has a better discrimination ability, and its Bayes optimal score function could give a correct top-K ranking with respect to the conditional probability. We present an empirical surrogate risk minimization framework to optimize the proposed metric.
arXiv Detail & Related papers (2022-09-03T11:09:13Z)
Integrating Rankings into Quantized Scores in Peer Review [61.27794774537103]
In peer review, reviewers are usually asked to provide scores for the papers. To mitigate this issue, conferences have started to ask reviewers to additionally provide a ranking of the papers they have reviewed. There are no standard procedure for using this ranking information and Area Chairs may use it in different ways. We take a principled approach to integrate the ranking information into the scores.
arXiv Detail & Related papers (2022-04-05T19:39:13Z)
Pointwise Binary Classification with Pairwise Confidence Comparisons [97.79518780631457]
We propose pairwise comparison (Pcomp) classification, where we have only pairs of unlabeled data that we know one is more likely to be positive than the other. We link Pcomp classification to noisy-label learning to develop a progressive URE and improve it by imposing consistency regularization.
arXiv Detail & Related papers (2020-10-05T09:23:58Z)
Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers. We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.