Advancing Generative Model Evaluation: A Novel Algorithm for Realistic
Image Synthesis and Comparison in OCR System
- URL: http://arxiv.org/abs/2402.17204v3
- Date: Fri, 1 Mar 2024 21:02:29 GMT
- Title: Advancing Generative Model Evaluation: A Novel Algorithm for Realistic
Image Synthesis and Comparison in OCR System
- Authors: Majid Memari, Khaled R. Ahmed, Shahram Rahimi, Noorbakhsh Amiri
Golilarz
- Abstract summary: This research addresses a critical challenge in the field of generative models, particularly in the generation and evaluation of synthetic images.
We introduce a pioneering algorithm to objectively assess the realism of synthetic images.
Our algorithm is particularly tailored to address the challenges in generating and evaluating realistic images of Arabic handwritten digits.
- Score: 1.2289361708127877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research addresses a critical challenge in the field of generative
models, particularly in the generation and evaluation of synthetic images.
Given the inherent complexity of generative models and the absence of a
standardized procedure for their comparison, our study introduces a pioneering
algorithm to objectively assess the realism of synthetic images. This approach
significantly enhances the evaluation methodology by refining the Fr\'echet
Inception Distance (FID) score, allowing for a more precise and subjective
assessment of image quality. Our algorithm is particularly tailored to address
the challenges in generating and evaluating realistic images of Arabic
handwritten digits, a task that has traditionally been near-impossible due to
the subjective nature of realism in image generation. By providing a systematic
and objective framework, our method not only enables the comparison of
different generative models but also paves the way for improvements in their
design and output. This breakthrough in evaluation and comparison is crucial
for advancing the field of OCR, especially for scripts that present unique
complexities, and sets a new standard in the generation and assessment of
high-quality synthetic images.
Related papers
- RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection [60.960988614701414]
RIGID is a training-free and model-agnostic method for robust AI-generated image detection.
RIGID significantly outperforms existing trainingbased and training-free detectors.
arXiv Detail & Related papers (2024-05-30T14:49:54Z) - A Survey on Quality Metrics for Text-to-Image Models [9.753473063305503]
We provide an overview of existing text-to-image quality metrics addressing their nuances and the need for alignment with human preferences.
We propose a new taxonomy for categorizing these metrics, which is grounded in the assumption that there are two main quality criteria, namely compositionality and generality.
We derive guidelines for practitioners conducting text-to-image evaluation, discuss open challenges of evaluation mechanisms, and surface limitations of current metrics.
arXiv Detail & Related papers (2024-03-18T14:24:20Z) - Evaluating Text-to-Image Generative Models: An Empirical Study on Human
Image Synthesis [22.550416199280953]
We present an empirical study introducing a nuanced evaluation framework for text-to-image (T2I) generative models.
Our framework categorizes evaluations into two distinct groups: first, focusing on image qualities such as aesthetics and realism, and second, examining text conditions through concept coverage and fairness.
We will release our code, the data used for evaluating generative models and the dataset annotated with defective areas soon.
arXiv Detail & Related papers (2024-03-08T07:41:47Z) - Improving Synthetically Generated Image Detection in Cross-Concept
Settings [20.21594285488186]
We focus on the challenge of generalizing across different concept classes, e.g., when training a detector on human faces.
We propose an approach based on the premise that the robustness of the detector can be enhanced by training it on realistic synthetic images.
arXiv Detail & Related papers (2023-04-24T12:45:00Z) - IRGen: Generative Modeling for Image Retrieval [82.62022344988993]
In this paper, we present a novel methodology, reframing image retrieval as a variant of generative modeling.
We develop our model, dubbed IRGen, to address the technical challenge of converting an image into a concise sequence of semantic units.
Our model achieves state-of-the-art performance on three widely-used image retrieval benchmarks and two million-scale datasets.
arXiv Detail & Related papers (2023-03-17T17:07:36Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z) - Image Quality Assessment in the Modern Age [53.19271326110551]
This tutorial provides the audience with the basic theories, methodologies, and current progresses of image quality assessment (IQA)
We will first revisit several subjective quality assessment methodologies, with emphasis on how to properly select visual stimuli.
Both hand-engineered and (deep) learning-based methods will be covered.
arXiv Detail & Related papers (2021-10-19T02:38:46Z) - Identity-Aware CycleGAN for Face Photo-Sketch Synthesis and Recognition [61.87842307164351]
We first propose an Identity-Aware CycleGAN (IACycleGAN) model that applies a new perceptual loss to supervise the image generation network.
It improves CycleGAN on photo-sketch synthesis by paying more attention to the synthesis of key facial regions, such as eyes and nose.
We develop a mutual optimization procedure between the synthesis model and the recognition model, which iteratively synthesizes better images by IACycleGAN.
arXiv Detail & Related papers (2021-03-30T01:30:08Z) - Adversarial Text-to-Image Synthesis: A Review [7.593633267653624]
We contextualize the state of the art of adversarial text-to-image synthesis models, their development since their inception five years ago, and propose a taxonomy based on the level of supervision.
We critically examine current strategies to evaluate text-to-image synthesis models, highlight shortcomings, and identify new areas of research, ranging from the development of better datasets and evaluation metrics to possible improvements in architectural design and model training.
This review complements previous surveys on generative adversarial networks with a focus on text-to-image synthesis which we believe will help researchers to further advance the field.
arXiv Detail & Related papers (2021-01-25T09:58:36Z) - NPRportrait 1.0: A Three-Level Benchmark for Non-Photorealistic
Rendering of Portraits [67.58044348082944]
This paper proposes a new structured, three level, benchmark dataset for the evaluation of stylised portrait images.
Rigorous criteria were used for its construction, and its consistency was validated by user studies.
A new methodology has been developed for evaluating portrait stylisation algorithms.
arXiv Detail & Related papers (2020-09-01T18:04:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.