Related papers: Mimicry: Towards the Reproducibility of GAN Research

Mimicry: Towards the Reproducibility of GAN Research

URL: http://arxiv.org/abs/2005.02494v1
Date: Tue, 5 May 2020 21:07:26 GMT
Title: Mimicry: Towards the Reproducibility of GAN Research
Authors: Kwot Sin Lee, Christopher Town
Abstract summary: We introduce Mimicry, a lightweight PyTorch library that provides implementations of popular state-of-the-art GANs and evaluation metrics. We provide comprehensive baseline performances of different GANs on seven widely-used datasets by training these GANs under the same conditions, and evaluating them across three popular GAN metrics using the same procedures.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Advancing the state of Generative Adversarial Networks (GANs) research requires one to make careful and accurate comparisons with existing works. Yet, this is often difficult to achieve in practice when models are often implemented differently using varying frameworks, and evaluated using different procedures even when the same metric is used. To mitigate these issues, we introduce Mimicry, a lightweight PyTorch library that provides implementations of popular state-of-the-art GANs and evaluation metrics to closely reproduce reported scores in the literature. We provide comprehensive baseline performances of different GANs on seven widely-used datasets by training these GANs under the same conditions, and evaluating them across three popular GAN metrics using the same procedures. The library can be found at https://github.com/kwotsin/mimicry.

Related papers

Evaluation of RAG Metrics for Question Answering in the Telecom Domain [0.650923326742559]
Retrieval Augmented Generation (RAG) is widely used to enable Large Language Models (LLMs) perform Question Answering (QA) tasks. This work is a modified version of this package for few metrics (faithfulness, context relevance, answer relevance, answer correctness, answer similarity and factual correctness) through which we provide the intermediate outputs of the prompts. Next, we analyse the expert evaluations of the output of the modified RAGAS package and observe the challenges of using it in the telecom domain.
arXiv Detail & Related papers (2024-07-15T17:40:15Z)
BERGEN: A Benchmarking Library for Retrieval-Augmented Generation [26.158785168036662]
Retrieval-Augmented Generation allows to enhance Large Language Models with external knowledge. Inconsistent benchmarking poses a major challenge in comparing approaches and understanding the impact of each component in the pipeline. In this work, we study best practices that lay the groundwork for a systematic evaluation of RAG and present BERGEN, an end-to-end library for reproducible research standardizing RAG experiments.
arXiv Detail & Related papers (2024-07-01T09:09:27Z)
Unsupervised evaluation of GAN sample quality: Introducing the TTJac Score [5.1359892878090845]
"TTJac score" is proposed to measure the fidelity of individual synthesized images in a data-free manner. The experimental results of applying the proposed metric to StyleGAN 2 and StyleGAN 2 ADA models on FFHQ, AFHQ-Wild, LSUN-Cars, and LSUN-Horse datasets are presented.
arXiv Detail & Related papers (2023-08-31T19:55:50Z)
Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation [55.92852268168816]
N-gram matching-based evaluation metrics, such as BLEU and chrF, are widely utilized across a range of natural language generation (NLG) tasks. Recent studies have revealed a weak correlation between these matching-based metrics and human evaluations. We propose to utilize textitmultiple references to enhance the consistency between these metrics and human evaluations.
arXiv Detail & Related papers (2023-08-06T14:49:26Z)
Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification [77.0114672086012]
Test-time adaptation (TTA) is a technique aimed at enhancing the generalization performance of models by leveraging unlabeled samples solely during prediction. We present a benchmark that systematically evaluates 13 prominent TTA methods and their variants on five widely used image classification datasets.
arXiv Detail & Related papers (2023-07-06T16:59:53Z)
DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables. We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels. We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z)
Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z)
A Study on the Evaluation of Generative Models [19.18642459565609]
Implicit generative models, which do not return likelihood values, have become prevalent in recent years. In this work, we study the evaluation metrics of generative models by generating a high-quality synthetic dataset. Our study shows that while FID and IS do correlate to several f-divergences, their ranking of close models can vary considerably.
arXiv Detail & Related papers (2022-06-22T09:27:31Z)
WRENCH: A Comprehensive Benchmark for Weak Supervision [66.82046201714766]
benchmark consists of 22 varied real-world datasets for classification and sequence tagging. We use benchmark to conduct extensive comparisons over more than 100 method variants to demonstrate its efficacy as a benchmark platform.
arXiv Detail & Related papers (2021-09-23T13:47:16Z)
Evaluation Metrics for Conditional Image Generation [100.69766435176557]
We present two new metrics for evaluating generative models in the class-conditional image generation setting. A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts. We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models.
arXiv Detail & Related papers (2020-04-26T12:15:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.