TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and
Diversity in Generative Models
- URL: http://arxiv.org/abs/2306.08013v6
- Date: Wed, 24 Jan 2024 07:48:34 GMT
- Title: TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and
Diversity in Generative Models
- Authors: Pum Jun Kim, Yoojin Jang, Jisu Kim, Jaejun Yoo
- Abstract summary: Topological Precision and Recall (TopP&R) provides a systematic approach to estimating supports.
We show that TopP&R is robust to outliers and non-independent and identically distributed (Non-IID) perturbations.
This is the first evaluation metric focused on the robust estimation of the support and provides its statistical consistency under noise.
- Score: 9.048102020202817
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a robust and reliable evaluation metric for generative models by
introducing topological and statistical treatments for rigorous support
estimation. Existing metrics, such as Inception Score (IS), Frechet Inception
Distance (FID), and the variants of Precision and Recall (P&R), heavily rely on
supports that are estimated from sample features. However, the reliability of
their estimation has not been seriously discussed (and overlooked) even though
the quality of the evaluation entirely depends on it. In this paper, we propose
Topological Precision and Recall (TopP&R, pronounced 'topper'), which provides
a systematic approach to estimating supports, retaining only topologically and
statistically important features with a certain level of confidence. This not
only makes TopP&R strong for noisy features, but also provides statistical
consistency. Our theoretical and experimental results show that TopP&R is
robust to outliers and non-independent and identically distributed (Non-IID)
perturbations, while accurately capturing the true trend of change in samples.
To the best of our knowledge, this is the first evaluation metric focused on
the robust estimation of the support and provides its statistical consistency
under noise.
Related papers
- Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a method called Stratified Prediction-Powered Inference (StratPPI)
We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods [49.62131719441252]
Attribution methods compute importance scores for input features to explain the output predictions of deep models.
In this work, we first identify a set of fidelity criteria that reliable benchmarks for attribution methods are expected to fulfill.
We then introduce a Backdoor-based eXplainable AI benchmark (BackX) that adheres to the desired fidelity criteria.
arXiv Detail & Related papers (2024-05-02T13:48:37Z) - Probabilistic Precision and Recall Towards Reliable Evaluation of
Generative Models [7.770029179741429]
We propose P-precision and P-recall (PP&PR), based on a probabilistic approach that address the problems.
We show that our PP&PR provide more reliable estimates for comparing fidelity and diversity than the existing metrics.
arXiv Detail & Related papers (2023-09-04T13:19:17Z) - GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models.
We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench.
GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z) - Random Noise vs State-of-the-Art Probabilistic Forecasting Methods : A
Case Study on CRPS-Sum Discrimination Ability [4.9449660544238085]
We show that the statistical properties of target data affect the discrimination ability of CRPS-Sum.
We highlight that CRPS-Sum calculation overlooks the performance of the model on each dimension.
We show that it is easily possible to have a better CRPS-Sum for a dummy model, which looks like random noise.
arXiv Detail & Related papers (2022-01-21T12:36:58Z) - Differential privacy and robust statistics in high dimensions [49.50869296871643]
High-dimensional Propose-Test-Release (HPTR) builds upon three crucial components: the exponential mechanism, robust statistics, and the Propose-Test-Release mechanism.
We show that HPTR nearly achieves the optimal sample complexity under several scenarios studied in the literature.
arXiv Detail & Related papers (2021-11-12T06:36:40Z) - An evaluation of word-level confidence estimation for end-to-end
automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR)
We provide an extensive benchmark of popular confidence methods on four well-known speech datasets.
Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z) - Robust Validation: Confident Predictions Even When Distributions Shift [19.327409270934474]
We describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions.
We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population.
An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it.
arXiv Detail & Related papers (2020-08-10T17:09:16Z) - Evaluating probabilistic classifiers: Reliability diagrams and score
decompositions revisited [68.8204255655161]
We introduce the CORP approach, which generates provably statistically Consistent, Optimally binned, and Reproducible reliability diagrams in an automated way.
Corpor is based on non-parametric isotonic regression and implemented via the Pool-adjacent-violators (PAV) algorithm.
arXiv Detail & Related papers (2020-08-07T08:22:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.