Revisiting the Evaluation of Image Synthesis with GANs
- URL: http://arxiv.org/abs/2304.01999v2
- Date: Mon, 23 Oct 2023 07:23:10 GMT
- Title: Revisiting the Evaluation of Image Synthesis with GANs
- Authors: Mengping Yang, Ceyuan Yang, Yichi Zhang, Qingyan Bai, Yujun Shen, Bo
Dai
- Abstract summary: This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
- Score: 55.72247435112475
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A good metric, which promises a reliable comparison between solutions, is
essential for any well-defined task. Unlike most vision tasks that have
per-sample ground-truth, image synthesis tasks target generating unseen data
and hence are usually evaluated through a distributional distance between one
set of real samples and another set of generated samples. This study presents
an empirical investigation into the evaluation of synthesis performance, with
generative adversarial networks (GANs) as a representative of generative
models. In particular, we make in-depth analyses of various factors, including
how to represent a data point in the representation space, how to calculate a
fair distance using selected samples, and how many instances to use from each
set. Extensive experiments conducted on multiple datasets and settings reveal
several important findings. Firstly, a group of models that include both
CNN-based and ViT-based architectures serve as reliable and robust feature
extractors for measurement evaluation. Secondly, Centered Kernel Alignment
(CKA) provides a better comparison across various extractors and hierarchical
layers in one model. Finally, CKA is more sample-efficient and enjoys better
agreement with human judgment in characterizing the similarity between two
internal data correlations. These findings contribute to the development of a
new measurement system, which enables a consistent and reliable re-evaluation
of current state-of-the-art generative models.
Related papers
- Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - The Importance of Downstream Networks in Digital Pathology Foundation Models [1.689369173057502]
We evaluate seven feature extractor models across three different datasets with 162 different aggregation model configurations.
We find that the performance of many current feature extractor models is notably similar.
arXiv Detail & Related papers (2023-11-29T16:54:25Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - Unsupervised evaluation of GAN sample quality: Introducing the TTJac
Score [5.1359892878090845]
"TTJac score" is proposed to measure the fidelity of individual synthesized images in a data-free manner.
The experimental results of applying the proposed metric to StyleGAN 2 and StyleGAN 2 ADA models on FFHQ, AFHQ-Wild, LSUN-Cars, and LSUN-Horse datasets are presented.
arXiv Detail & Related papers (2023-08-31T19:55:50Z) - Robustness Analysis of Deep Learning Models for Population Synthesis [5.9106199000537645]
We present bootstrap confidence interval for the deep generative models to evaluate robustness to multiple datasets.
The models are implemented on multiple travel diaries of Montreal Origin- Destination Survey of 2008, 2013, and 2018.
Results show that the predictive errors of CTGAN have narrower confidence intervals indicating its robustness to multiple datasets.
arXiv Detail & Related papers (2022-11-23T22:55:55Z) - SynBench: Task-Agnostic Benchmarking of Pretrained Representations using
Synthetic Data [78.21197488065177]
Recent success in fine-tuning large models, that are pretrained on broad data at scale, on downstream tasks has led to a significant paradigm shift in deep learning.
This paper proposes a new task-agnostic framework, textitSynBench, to measure the quality of pretrained representations using synthetic data.
arXiv Detail & Related papers (2022-10-06T15:25:00Z) - IMACS: Image Model Attribution Comparison Summaries [16.80986701058596]
We introduce IMACS, a method that combines gradient-based model attributions with aggregation and visualization techniques.
IMACS extracts salient input features from an evaluation dataset, clusters them based on similarity, then visualizes differences in model attributions for similar input features.
We show how our technique can uncover behavioral differences caused by domain shift between two models trained on satellite images.
arXiv Detail & Related papers (2022-01-26T21:35:14Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.