Evaluation Metrics for Conditional Image Generation
- URL: http://arxiv.org/abs/2004.12361v2
- Date: Mon, 8 Feb 2021 12:36:48 GMT
- Title: Evaluation Metrics for Conditional Image Generation
- Authors: Yaniv Benny, Tomer Galanti, Sagie Benaim, Lior Wolf
- Abstract summary: We present two new metrics for evaluating generative models in the class-conditional image generation setting.
A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts.
We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models.
- Score: 100.69766435176557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present two new metrics for evaluating generative models in the
class-conditional image generation setting. These metrics are obtained by
generalizing the two most popular unconditional metrics: the Inception Score
(IS) and the Fre'chet Inception Distance (FID). A theoretical analysis shows
the motivation behind each proposed metric and links the novel metrics to their
unconditional counterparts. The link takes the form of a product in the case of
IS or an upper bound in the FID case. We provide an extensive empirical
evaluation, comparing the metrics to their unconditional variants and to other
metrics, and utilize them to analyze existing generative models, thus providing
additional insights about their performance, from unlearned classes to mode
collapse.
Related papers
- Cobra Effect in Reference-Free Image Captioning Metrics [58.438648377314436]
A proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged.
In this paper, we study if there are any deficiencies in reference-free metrics.
We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-02-18T12:36:23Z) - EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [70.19437817951673]
We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities.
Our approach involves generating a diverse and comprehensive list of 700 prompts for text-to-video generation.
Then, we evaluate the state-of-the-art video generative models on our carefully designed benchmark, in terms of visual qualities, content qualities, motion qualities, and text-video alignment with 17 well-selected objective metrics.
arXiv Detail & Related papers (2023-10-17T17:50:46Z) - Unsupervised evaluation of GAN sample quality: Introducing the TTJac
Score [5.1359892878090845]
"TTJac score" is proposed to measure the fidelity of individual synthesized images in a data-free manner.
The experimental results of applying the proposed metric to StyleGAN 2 and StyleGAN 2 ADA models on FFHQ, AFHQ-Wild, LSUN-Cars, and LSUN-Horse datasets are presented.
arXiv Detail & Related papers (2023-08-31T19:55:50Z) - Challenges to Evaluating the Generalization of Coreference Resolution Models: A Measurement Modeling Perspective [69.50044040291847]
We show how multi-dataset evaluations risk conflating different factors concerning what, precisely, is being measured.
This makes it difficult to draw more generalizable conclusions from these evaluations.
arXiv Detail & Related papers (2023-03-16T05:32:02Z) - Feature Likelihood Divergence: Evaluating the Generalization of
Generative Models Using Samples [25.657798631897908]
Feature Likelihood Divergence provides a comprehensive trichotomic evaluation of generative models.
We empirically demonstrate the ability of FLD to identify overfitting problem cases, even when previously proposed metrics fail.
arXiv Detail & Related papers (2023-02-09T04:57:27Z) - SMART: Sentences as Basic Units for Text Evaluation [48.5999587529085]
In this paper, we introduce a new metric called SMART to mitigate such limitations.
We treat sentences as basic units of matching instead of tokens, and use a sentence matching function to soft-match candidate and reference sentences.
Our results show that system-level correlations of our proposed metric with a model-based matching function outperforms all competing metrics.
arXiv Detail & Related papers (2022-08-01T17:58:05Z) - A Study on the Evaluation of Generative Models [19.18642459565609]
Implicit generative models, which do not return likelihood values, have become prevalent in recent years.
In this work, we study the evaluation metrics of generative models by generating a high-quality synthetic dataset.
Our study shows that while FID and IS do correlate to several f-divergences, their ranking of close models can vary considerably.
arXiv Detail & Related papers (2022-06-22T09:27:31Z) - A Unified Framework for Rank-based Evaluation Metrics for Link
Prediction in Knowledge Graphs [19.822126244784133]
Link prediction task on knowledge graphs without explicit negative triples motivates the usage of rank-based metrics.
We introduce a simple theoretical framework for rank-based metrics upon which we investigate two avenues for improvements to existing metrics via alternative aggregation functions and concepts from probability theory.
We propose several new rank-based metrics that are more easily interpreted and compared accompanied by a demonstration of their usage in a benchmarking of knowledge graph embedding models.
arXiv Detail & Related papers (2022-03-14T23:09:46Z) - Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand [117.62186420147563]
We propose a generalization of leaderboards, bidimensional leaderboards (Billboards)
Unlike conventional unidimensional leaderboards that sort submitted systems by predetermined metrics, a Billboard accepts both generators and evaluation metrics as competing entries.
We demonstrate that a linear ensemble of a few diverse metrics sometimes substantially outperforms existing metrics in isolation.
arXiv Detail & Related papers (2021-12-08T06:34:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.