Unsupervised evaluation of GAN sample quality: Introducing the TTJac
Score
- URL: http://arxiv.org/abs/2309.00107v1
- Date: Thu, 31 Aug 2023 19:55:50 GMT
- Title: Unsupervised evaluation of GAN sample quality: Introducing the TTJac
Score
- Authors: Egor Sevriugov, Ivan Oseledets
- Abstract summary: "TTJac score" is proposed to measure the fidelity of individual synthesized images in a data-free manner.
The experimental results of applying the proposed metric to StyleGAN 2 and StyleGAN 2 ADA models on FFHQ, AFHQ-Wild, LSUN-Cars, and LSUN-Horse datasets are presented.
- Score: 5.1359892878090845
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Evaluation metrics are essential for assessing the performance of generative
models in image synthesis. However, existing metrics often involve high memory
and time consumption as they compute the distance between generated samples and
real data points. In our study, the new evaluation metric called the "TTJac
score" is proposed to measure the fidelity of individual synthesized images in
a data-free manner. The study first establishes a theoretical approach to
directly evaluate the generated sample density. Then, a method incorporating
feature extractors and discrete function approximation through tensor train is
introduced to effectively assess the quality of generated samples. Furthermore,
the study demonstrates that this new metric can be used to improve the
fidelity-variability trade-off when applying the truncation trick. The
experimental results of applying the proposed metric to StyleGAN 2 and StyleGAN
2 ADA models on FFHQ, AFHQ-Wild, LSUN-Cars, and LSUN-Horse datasets are
presented. The code used in this research will be made publicly available
online for the research community to access and utilize.
Related papers
- DSF-GAN: DownStream Feedback Generative Adversarial Network [0.07083082555458872]
We propose a novel architecture called the DownStream Feedback Generative Adversarial Network (DSF-GAN)
DSF-GAN incorporates feedback from a downstream prediction model during training to augment the generator's loss function with valuable information.
Our experiments demonstrate improved model performance when training on synthetic samples generated by DSF-GAN, compared to those generated by the same GAN architecture without feedback.
arXiv Detail & Related papers (2024-03-27T05:41:50Z) - FFAD: A Novel Metric for Assessing Generated Time Series Data Utilizing
Fourier Transform and Auto-encoder [9.103662085683304]
The Fr'echet Inception Distance (FID) serves as the standard metric for evaluating generative models in image synthesis.
This work proposes a novel solution leveraging the Fourier transform and Auto-encoder, termed the Fr'echet Fourier-transform Auto-encoder Distance (FFAD)
Through our experimental results, we showcase the potential of FFAD for effectively distinguishing samples from different classes.
arXiv Detail & Related papers (2024-03-11T10:26:04Z) - DualView: Data Attribution from the Dual Perspective [16.083769847895336]
We present DualView, a novel method for post-hoc data attribution based on surrogate modelling.
We find that DualView requires considerably lower computational resources than other methods, while demonstrating comparable performance across evaluation metrics.
arXiv Detail & Related papers (2024-02-19T13:13:16Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - A Study on the Evaluation of Generative Models [19.18642459565609]
Implicit generative models, which do not return likelihood values, have become prevalent in recent years.
In this work, we study the evaluation metrics of generative models by generating a high-quality synthetic dataset.
Our study shows that while FID and IS do correlate to several f-divergences, their ranking of close models can vary considerably.
arXiv Detail & Related papers (2022-06-22T09:27:31Z) - Fake It Till You Make It: Near-Distribution Novelty Detection by
Score-Based Generative Models [54.182955830194445]
existing models either fail or face a dramatic drop under the so-called near-distribution" setting.
We propose to exploit a score-based generative model to produce synthetic near-distribution anomalous data.
Our method improves the near-distribution novelty detection by 6% and passes the state-of-the-art by 1% to 5% across nine novelty detection benchmarks.
arXiv Detail & Related papers (2022-05-28T02:02:53Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z) - Reliable Evaluations for Natural Language Inference based on a Unified
Cross-dataset Benchmark [54.782397511033345]
Crowd-sourced Natural Language Inference (NLI) datasets may suffer from significant biases like annotation artifacts.
We present a new unified cross-datasets benchmark with 14 NLI datasets and re-evaluate 9 widely-used neural network-based NLI models.
Our proposed evaluation scheme and experimental baselines could provide a basis to inspire future reliable NLI research.
arXiv Detail & Related papers (2020-10-15T11:50:12Z) - Evaluation Metrics for Conditional Image Generation [100.69766435176557]
We present two new metrics for evaluating generative models in the class-conditional image generation setting.
A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts.
We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models.
arXiv Detail & Related papers (2020-04-26T12:15:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.