A New Perspective on Precision and Recall for Generative Models
- URL: http://arxiv.org/abs/2511.02414v1
- Date: Tue, 04 Nov 2025 09:44:11 GMT
- Title: A New Perspective on Precision and Recall for Generative Models
- Authors: Benjamin Sykes, Loïc Simon, Julien Rabin, Jalal Fadili,
- Abstract summary: Precision and Recall (PR) for generative model has opened up a new avenue of research.<n>We present a new framework for estimating entire PR curves based on a binary classification standpoint.
- Score: 3.1323840021317255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the recent success of generative models in image and text, the question of their evaluation has recently gained a lot of attention. While most methods from the state of the art rely on scalar metrics, the introduction of Precision and Recall (PR) for generative model has opened up a new avenue of research. The associated PR curve allows for a richer analysis, but their estimation poses several challenges. In this paper, we present a new framework for estimating entire PR curves based on a binary classification standpoint. We conduct a thorough statistical analysis of the proposed estimates. As a byproduct, we obtain a minimax upper bound on the PR estimation risk. We also show that our framework extends several landmark PR metrics of the literature which by design are restrained to the extreme values of the curve. Finally, we study the different behaviors of the curves obtained experimentally in various settings.
Related papers
- Rethinking Metrics and Benchmarks of Video Anomaly Detection [58.37571339811799]
Video Anomaly Detection (VAD) aims to detect anomalies that deviate from expectation.<n>Existing VAD metrics are influenced by single annotation bias.<n>Existing benchmarks lack the capability to evaluate scene overfitting of fully/weakly-supervised algorithms.
arXiv Detail & Related papers (2025-05-25T08:09:42Z) - On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts [66.39976432286905]
We study the convergence rates of the maximum likelihood estimator of gating and prompt parameters.<n>We find that the estimability of these parameters is compromised when the prompt acquires overlapping knowledge with the pre-trained model.
arXiv Detail & Related papers (2025-05-24T01:30:46Z) - Enhancement of Approximation Spaces by the Use of Primals and Neighborhood [0.0]
We introduce four new generalized rough set models that draw inspiration from "neighborhoods and primals"
We claim that the current models can preserve nearly all significant aspects associated with the rough set model.
We also demonstrate that the new strategy we define for our everyday health-related problem yields more accurate findings.
arXiv Detail & Related papers (2024-10-23T18:49:13Z) - Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models [58.6172667880028]
We propose a new method called forgetting curve to measure the memorization capability of long-context models.
We show that forgetting curve has the advantage of being robust to the tested corpus and the experimental settings.
Our measurement provides empirical evidence for the effectiveness of transformer extension techniques while raises questions for the effective length of RNN/SSM based models.
arXiv Detail & Related papers (2024-10-07T03:38:27Z) - Unifying and extending Precision Recall metrics for assessing generative models [1.17431678544333]
We show that generative models are compared in terms of scalar values such as Frechet Inception Distance (FID) or Inception Score (IS)
We also provide consistency results that go well beyond the ones presented in the corresponding literature.
arXiv Detail & Related papers (2024-05-02T13:19:21Z) - Precision-Recall Divergence Optimization for Generative Modeling with
GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows.
We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences.
Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z) - Hierarchical Gaussian Process Models for Regression Discontinuity/Kink
under Sharp and Fuzzy Designs [0.0]
We propose nonparametric Bayesian estimators for causal inference exploiting Regression Discontinuity/Kink (RD/RK)
These estimators are extended to hierarchical GP models with an intermediate Bayesian neural network layer.
Monte Carlo simulations show that our estimators perform similarly and often better than competing estimators in terms of precision, coverage and interval length.
arXiv Detail & Related papers (2021-10-03T04:23:56Z) - Pros and Cons of GAN Evaluation Measures: New Developments [53.10151901863263]
This work is an update of a previous paper on the same topic published a few years ago.
I describe new dimensions that are becoming important in assessing models, and discuss the connection between GAN evaluation and deepfakes.
arXiv Detail & Related papers (2021-03-17T01:48:34Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z) - Provable Benefits of Overparameterization in Model Compression: From
Double Descent to Pruning Neural Networks [38.153825455980645]
Recent empirical evidence indicates that the practice of overization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models.
This paper sheds light on these empirical findings by theoretically characterizing the high-dimensional toolsets of model pruning.
We analytically identify regimes in which, even if the location of the most informative features is known, we are better off fitting a large model and then pruning.
arXiv Detail & Related papers (2020-12-16T05:13:30Z) - On the Theoretical Equivalence of Several Trade-Off Curves Assessing
Statistical Proximity [4.626261940793027]
We propose a unification of four curves known respectively as: the precision-recall (PR) curve, the Lorenz curve, the receiver operating characteristic (ROC) curve and a special case of R'enyi divergence frontiers.
In addition, we discuss possible links between PR / Lorenz curves with the derivation of domain adaptation bounds.
arXiv Detail & Related papers (2020-06-21T14:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.