DeepArt: A Benchmark to Advance Fidelity Research in AI-Generated
Content
- URL: http://arxiv.org/abs/2312.10407v2
- Date: Sun, 24 Dec 2023 18:51:05 GMT
- Title: DeepArt: A Benchmark to Advance Fidelity Research in AI-Generated
Content
- Authors: Wentao Wang, Xuanyao Huang, Tianyang Wang, Swalpa Kumar Roy
- Abstract summary: This paper explores the image synthesis capabilities of GPT-4, a leading multi-modal large language model.
We establish a benchmark for evaluating the fidelity of texture features in images generated by GPT-4, comprising manually painted pictures and their AI-generated counterparts.
We have compiled a unique benchmark of manual drawings and corresponding GPT-4-generated images, introducing a new task to advance fidelity research in AI-generated content.
- Score: 9.482738088610535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores the image synthesis capabilities of GPT-4, a leading
multi-modal large language model. We establish a benchmark for evaluating the
fidelity of texture features in images generated by GPT-4, comprising manually
painted pictures and their AI-generated counterparts. The contributions of this
study are threefold: First, we provide an in-depth analysis of the fidelity of
image synthesis features based on GPT-4, marking the first such study on this
state-of-the-art model. Second, the quantitative and qualitative experiments
fully reveals the limitations of the GPT-4 model in image synthesis. Third, we
have compiled a unique benchmark of manual drawings and corresponding
GPT-4-generated images, introducing a new task to advance fidelity research in
AI-generated content (AIGC). The dataset is available at:
\url{https://github.com/rickwang28574/DeepArt}.
Related papers
- Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability [6.586119023242877]
OpenAI's multimodal GPT-4o has demonstrated remarkable capabilities in image generation and editing.
But its ability to achieve world knowledge-informed semantic synthesis remains unproven.
Our study calls for the development of more robust benchmarks and training strategies.
arXiv Detail & Related papers (2025-04-09T16:10:15Z) - An Empirical Study of GPT-4o Image Generation Capabilities [40.86026243294732]
We conduct an empirical study of GPT-4o's image generation capabilities, benchmarking it against leading open-source and commercial models.
Our analysis highlights the strengths and limitations of GPT-4o under various settings, and situates it within the broader evolution of generative modeling.
arXiv Detail & Related papers (2025-04-08T12:34:36Z) - GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation [28.235805447825896]
OpenAI's GPT4o model has demonstrated surprisingly good capabilities in image generation and editing.
This report presents the first-look evaluation benchmark (named GPT-ImgEval)
We show GPT-4o's performance across three critical dimensions: generation quality, (2) editing proficiency, and (3) world knowledge-informed synthesis.
arXiv Detail & Related papers (2025-04-03T17:23:16Z) - Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark [63.97125827026949]
This paper explores the feasibility of using text-to-image models in a zero-shot setup to generate images for taxonomy concepts.
A benchmark is proposed that assesses models' abilities to understand taxonomy concepts and generate relevant, high-quality images.
The 12 models are evaluated using 9 novel taxonomy-related text-to-image metrics and human feedback.
arXiv Detail & Related papers (2025-03-13T13:37:54Z) - Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical Images [0.5825410941577593]
We present an AI-based pipeline for PHI detection comprising text detection, text extraction, and text analysis.
We benchmark three models, YOLOv11, EasyOCR, and GPT-4o, across different setups corresponding to these components.
The combination of YOLOv11 for text localization and GPT-4o for extraction and analysis yields the best results.
arXiv Detail & Related papers (2025-01-16T14:12:33Z) - An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging [0.3029213689620348]
We explore the potential of the Gemini (textitgemini-1.0-pro-vision-latest) and GPT-4V models for medical image analysis.
Both Gemini AI and GPT-4V are first used to classify real versus synthetic images, followed by an interpretation and analysis of the input images.
Our early investigation presented in this work provides insights into the potential of MLLMs to assist with the classification and interpretation of retinal fundoscopy and lung X-ray images.
arXiv Detail & Related papers (2024-06-02T08:29:23Z) - Learning Vision from Models Rivals Learning Vision from Data [54.43596959598465]
We introduce SynCLR, a novel approach for learning visual representations exclusively from synthetic images and synthetic captions.
We synthesize a large dataset of image captions using LLMs, then use an off-the-shelf text-to-image model to generate multiple images corresponding to each synthetic caption.
We perform visual representation learning on these synthetic images via contrastive learning, treating images sharing the same caption as positive pairs.
arXiv Detail & Related papers (2023-12-28T18:59:55Z) - Gemini Pro Defeated by GPT-4V: Evidence from Education [1.0226894006814744]
GPT-4V significantly outperforms Gemini Pro in terms of scoring accuracy and Quadratic Weighted Kappa.
Findings suggest GPT-4V's superior capability in handling complex educational tasks.
arXiv Detail & Related papers (2023-12-27T02:56:41Z) - Gemini vs GPT-4V: A Preliminary Comparison and Combination of
Vision-Language Models Through Qualitative Cases [98.35348038111508]
This paper presents an in-depth comparative study of two pioneering models: Google's Gemini and OpenAI's GPT-4V(ision)
The core of our analysis delves into the distinct visual comprehension abilities of each model.
Our findings illuminate the unique strengths and niches of both models.
arXiv Detail & Related papers (2023-12-22T18:59:58Z) - GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? [82.40761196684524]
This paper centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks.
We conduct extensive experiments to evaluate GPT-4's performance across images, videos, and point clouds.
Our findings show that GPT-4, enhanced with rich linguistic descriptions, significantly improves zero-shot recognition.
arXiv Detail & Related papers (2023-11-27T11:29:10Z) - The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [121.42924593374127]
We analyze the latest model, GPT-4V, to deepen the understanding of LMMs.
GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs makes it a powerful multimodal generalist system.
GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods.
arXiv Detail & Related papers (2023-09-29T17:34:51Z) - Can GPT-4 Perform Neural Architecture Search? [56.98363718371614]
We investigate the potential of GPT-4 to perform Neural Architecture Search (NAS)
Our proposed approach, textbfGPT-4 textbfEnhanced textbfNeural archtextbfItecttextbfUre textbfSearch (GENIUS)
We assess GENIUS across several benchmarks, comparing it with existing state-of-the-art NAS techniques to illustrate its effectiveness.
arXiv Detail & Related papers (2023-04-21T14:06:44Z) - MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large
Language Models [41.84885546518666]
GPT-4 has demonstrated extraordinary multi-modal abilities, such as directly generating websites from handwritten text.
We present MiniGPT-4, which aligns a frozen visual encoder with a frozen advanced large language model.
We also observe other emerging capabilities in MiniGPT-4, including writing stories and poems inspired by given images.
arXiv Detail & Related papers (2023-04-20T18:25:35Z) - IMAGINE: Image Synthesis by Image-Guided Model Inversion [79.4691654458141]
We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images.
We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations.
IMAGINE enables the synthesis procedure to simultaneously 1) enforce semantic specificity constraints during the synthesis, 2) produce realistic images without generator training, and 3) give users intuitive control over the generation process.
arXiv Detail & Related papers (2021-04-13T02:00:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.