Mimicry: Towards the Reproducibility of GAN Research
- URL: http://arxiv.org/abs/2005.02494v1
- Date: Tue, 5 May 2020 21:07:26 GMT
- Title: Mimicry: Towards the Reproducibility of GAN Research
- Authors: Kwot Sin Lee, Christopher Town
- Abstract summary: We introduce Mimicry, a lightweight PyTorch library that provides implementations of popular state-of-the-art GANs and evaluation metrics.
We provide comprehensive baseline performances of different GANs on seven widely-used datasets by training these GANs under the same conditions, and evaluating them across three popular GAN metrics using the same procedures.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advancing the state of Generative Adversarial Networks (GANs) research
requires one to make careful and accurate comparisons with existing works. Yet,
this is often difficult to achieve in practice when models are often
implemented differently using varying frameworks, and evaluated using different
procedures even when the same metric is used. To mitigate these issues, we
introduce Mimicry, a lightweight PyTorch library that provides implementations
of popular state-of-the-art GANs and evaluation metrics to closely reproduce
reported scores in the literature. We provide comprehensive baseline
performances of different GANs on seven widely-used datasets by training these
GANs under the same conditions, and evaluating them across three popular GAN
metrics using the same procedures. The library can be found at
https://github.com/kwotsin/mimicry.
Related papers
- LibEER: A Comprehensive Benchmark and Algorithm Library for EEG-based Emotion Recognition [14.57188626072955]
We propose LibEER, a comprehensive benchmark and algorithm library for fair comparisons in EEG-based emotion recognition.
LibEER establishes a unified evaluation framework with standardized experimental settings, enabling unbiased evaluations of over ten representative deep learning-based EER models.
arXiv Detail & Related papers (2024-10-13T07:51:39Z) - Evaluation of RAG Metrics for Question Answering in the Telecom Domain [0.650923326742559]
Retrieval Augmented Generation (RAG) is widely used to enable Large Language Models (LLMs) perform Question Answering (QA) tasks.
This work is a modified version of this package for few metrics (faithfulness, context relevance, answer relevance, answer correctness, answer similarity and factual correctness) through which we provide the intermediate outputs of the prompts.
Next, we analyse the expert evaluations of the output of the modified RAGAS package and observe the challenges of using it in the telecom domain.
arXiv Detail & Related papers (2024-07-15T17:40:15Z) - BERGEN: A Benchmarking Library for Retrieval-Augmented Generation [26.158785168036662]
Retrieval-Augmented Generation allows to enhance Large Language Models with external knowledge.
Inconsistent benchmarking poses a major challenge in comparing approaches and understanding the impact of each component in the pipeline.
In this work, we study best practices that lay the groundwork for a systematic evaluation of RAG and present BERGEN, an end-to-end library for reproducible research standardizing RAG experiments.
arXiv Detail & Related papers (2024-07-01T09:09:27Z) - Towards Multiple References Era -- Addressing Data Leakage and Limited
Reference Diversity in NLG Evaluation [55.92852268168816]
N-gram matching-based evaluation metrics, such as BLEU and chrF, are widely utilized across a range of natural language generation (NLG) tasks.
Recent studies have revealed a weak correlation between these matching-based metrics and human evaluations.
We propose to utilize textitmultiple references to enhance the consistency between these metrics and human evaluations.
arXiv Detail & Related papers (2023-08-06T14:49:26Z) - Benchmarking Test-Time Adaptation against Distribution Shifts in Image
Classification [77.0114672086012]
Test-time adaptation (TTA) is a technique aimed at enhancing the generalization performance of models by leveraging unlabeled samples solely during prediction.
We present a benchmark that systematically evaluates 13 prominent TTA methods and their variants on five widely used image classification datasets.
arXiv Detail & Related papers (2023-07-06T16:59:53Z) - DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables.
We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels.
We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - A Study on the Evaluation of Generative Models [19.18642459565609]
Implicit generative models, which do not return likelihood values, have become prevalent in recent years.
In this work, we study the evaluation metrics of generative models by generating a high-quality synthetic dataset.
Our study shows that while FID and IS do correlate to several f-divergences, their ranking of close models can vary considerably.
arXiv Detail & Related papers (2022-06-22T09:27:31Z) - WRENCH: A Comprehensive Benchmark for Weak Supervision [66.82046201714766]
benchmark consists of 22 varied real-world datasets for classification and sequence tagging.
We use benchmark to conduct extensive comparisons over more than 100 method variants to demonstrate its efficacy as a benchmark platform.
arXiv Detail & Related papers (2021-09-23T13:47:16Z) - Evaluation Metrics for Conditional Image Generation [100.69766435176557]
We present two new metrics for evaluating generative models in the class-conditional image generation setting.
A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts.
We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models.
arXiv Detail & Related papers (2020-04-26T12:15:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.