Pros and Cons of GAN Evaluation Measures: New Developments
- URL: http://arxiv.org/abs/2103.09396v1
- Date: Wed, 17 Mar 2021 01:48:34 GMT
- Title: Pros and Cons of GAN Evaluation Measures: New Developments
- Authors: Ali Borji
- Abstract summary: This work is an update of a previous paper on the same topic published a few years ago.
I describe new dimensions that are becoming important in assessing models, and discuss the connection between GAN evaluation and deepfakes.
- Score: 53.10151901863263
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work is an update of a previous paper on the same topic published a few
years ago. With the dramatic progress in generative modeling, a suite of new
quantitative and qualitative techniques to evaluate models has emerged.
Although some measures such as Inception Score, Fr\'echet Inception Distance,
Precision-Recall, and Perceptual Path Length are relatively more popular, GAN
evaluation is not a settled issue and there is still room for improvement. For
example, in addition to quality and diversity of synthesized images, generative
models should be evaluated in terms of bias and fairness. I describe new
dimensions that are becoming important in assessing models, and discuss the
connection between GAN evaluation and deepfakes.
Related papers
- Benchmarking the Attribution Quality of Vision Models [13.255247017616687]
We propose a novel evaluation protocol that overcomes two fundamental limitations of the widely used incremental-deletion protocol.
This allows us to evaluate 23 attribution methods and how different design choices of popular vision backbones affect their attribution quality.
We find that intrinsically explainable models outperform standard models and that raw attribution values exhibit a higher attribution quality than what is known from previous work.
arXiv Detail & Related papers (2024-07-16T17:02:20Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Operationalizing Specifications, In Addition to Test Sets for Evaluating
Constrained Generative Models [17.914521288548844]
We argue that the scale of generative models could be exploited to raise the abstraction level at which evaluation itself is conducted.
Our recommendations are based on leveraging specifications as a powerful instrument to evaluate generation quality.
arXiv Detail & Related papers (2022-11-19T06:39:43Z) - Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z) - Image Quality Assessment in the Modern Age [53.19271326110551]
This tutorial provides the audience with the basic theories, methodologies, and current progresses of image quality assessment (IQA)
We will first revisit several subjective quality assessment methodologies, with emphasis on how to properly select visual stimuli.
Both hand-engineered and (deep) learning-based methods will be covered.
arXiv Detail & Related papers (2021-10-19T02:38:46Z) - Who Explains the Explanation? Quantitatively Assessing Feature
Attribution Methods [0.0]
We propose a novel evaluation metric -- the Focus -- designed to quantify the faithfulness of explanations.
We show the robustness of the metric through randomization experiments, and then use Focus to evaluate and compare three popular explainability techniques.
Our results find LRP and GradCAM to be consistent and reliable, while the latter remains most competitive even when applied to poorly performing models.
arXiv Detail & Related papers (2021-09-28T07:10:24Z) - Instance-Level Relative Saliency Ranking with Graph Reasoning [126.09138829920627]
We present a novel unified model to segment salient instances and infer relative saliency rank order.
A novel loss function is also proposed to effectively train the saliency ranking branch.
experimental results demonstrate that our proposed model is more effective than previous methods.
arXiv Detail & Related papers (2021-07-08T13:10:42Z) - Regression or Classification? New Methods to Evaluate No-Reference
Picture and Video Quality Models [45.974399400141685]
We propose two new methods to evaluate and compare no-reference quality models at coarser levels.
We conduct a benchmark experiment of popular no-reference quality models on recent in-the-wild picture and video quality datasets.
arXiv Detail & Related papers (2021-01-30T05:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.