Related papers: DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design

DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design

URL: http://arxiv.org/abs/2310.15144v1
Date: Mon, 23 Oct 2023 17:48:38 GMT
Title: DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design
Authors: Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Lijuan Wang
Abstract summary: We introduce DEsignBench, a text-to-image (T2I) generation benchmark tailored for visual design scenarios. For DEsignBench benchmarking, we perform human evaluations on generated images against the criteria of image-text alignment, visual aesthetic, and design creativity. In addition to human evaluations, we introduce the first automatic image generation evaluator powered by GPT-4V.
Score: 124.56730013968543
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce DEsignBench, a text-to-image (T2I) generation benchmark tailored for visual design scenarios. Recent T2I models like DALL-E 3 and others, have demonstrated remarkable capabilities in generating photorealistic images that align closely with textual inputs. While the allure of creating visually captivating images is undeniable, our emphasis extends beyond mere aesthetic pleasure. We aim to investigate the potential of using these powerful models in authentic design contexts. In pursuit of this goal, we develop DEsignBench, which incorporates test samples designed to assess T2I models on both "design technical capability" and "design application scenario." Each of these two dimensions is supported by a diverse set of specific design categories. We explore DALL-E 3 together with other leading T2I models on DEsignBench, resulting in a comprehensive visual gallery for side-by-side comparisons. For DEsignBench benchmarking, we perform human evaluations on generated images in DEsignBench gallery, against the criteria of image-text alignment, visual aesthetic, and design creativity. Our evaluation also considers other specialized design capabilities, including text rendering, layout composition, color harmony, 3D design, and medium style. In addition to human evaluations, we introduce the first automatic image generation evaluator powered by GPT-4V. This evaluator provides ratings that align well with human judgments, while being easily replicable and cost-efficient. A high-resolution version is available at https://github.com/design-bench/design-bench.github.io/raw/main/designbench.pdf?download=

Related papers

Inkspire: Supporting Design Exploration with Generative AI through Analogical Sketching [16.33879333386818]
Inkspire is a sketch-driven tool that supports designers in prototyping product design concepts. In a study comparing Inkspire to ControlNet, we found that Inkspire supported designers with more inspiration and exploration of design ideas.
arXiv Detail & Related papers (2025-01-30T18:59:04Z)
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models [52.73820275861131]
Text-to-image(T2I) models have made significant progress, showcasing impressive abilities in prompt following and image generation. Recent models such as FLUX.1 and Ideogram2.0 have demonstrated exceptional performance across various complex tasks. This study provides valuable insights into the current state and future trajectory of T2I models as they evolve towards general-purpose usability.
arXiv Detail & Related papers (2025-01-23T18:58:33Z)
Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping [55.98643055756135]
We introduce Sketch2Code, a benchmark that evaluates state-of-the-art Vision Language Models (VLMs) on automating the conversion of rudimentary sketches into webpage prototypes. We analyze ten commercial and open-source models, showing that Sketch2Code is challenging for existing VLMs. A user study with UI/UX experts reveals a significant preference for proactive question-asking over passive feedback reception.
arXiv Detail & Related papers (2024-10-21T17:39:49Z)
KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models. We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals. Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z)
ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty [52.15933752463479]
ConceptMix is a scalable, controllable, and customizable benchmark. It automatically evaluates compositional generation ability of Text-to-Image (T2I) models. It reveals that the performance of several models, especially open models, drops dramatically with increased k.
arXiv Detail & Related papers (2024-08-26T15:08:12Z)
PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models [50.33699462106502]
Text-to-image (T2I) models frequently fail to produce images consistent with physical commonsense. Current T2I evaluation benchmarks focus on metrics such as accuracy, bias, and safety, neglecting the evaluation of models' internal knowledge. We introduce PhyBench, a comprehensive T2I evaluation dataset comprising 700 prompts across 4 primary categories: mechanics, optics, thermodynamics, and material properties.
arXiv Detail & Related papers (2024-06-17T17:49:01Z)
I-Design: Personalized LLM Interior Designer [57.00412237555167]
I-Design is a personalized interior designer that allows users to generate and visualize their design goals through natural language communication. I-Design starts with a team of large language model agents that engage in dialogues and logical reasoning with one another. The final design is then constructed in 3D by retrieving and integrating assets from an existing object database.
arXiv Detail & Related papers (2024-04-03T16:17:53Z)
HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models [39.38477117444303]
HRS-Bench is an evaluation benchmark for Text-to-Image (T2I) models. It measures 13 skills that can be categorized into five major categories: accuracy, robustness, generalization, fairness, and bias. It covers 50 scenarios, including fashion, animals, transportation, food, and clothes.
arXiv Detail & Related papers (2023-04-11T17:59:13Z)
Evaluation of Sketch-Based and Semantic-Based Modalities for Mockup Generation [15.838427479984926]
Design mockups are essential instruments for visualizing and testing design ideas. We present and evaluate two different modalities for generating mockups based on hand-drawn sketches. Our results show that sketch-based generation was more intuitive and expressive, while semantic-based generative AI obtained better results in terms of quality and fidelity.
arXiv Detail & Related papers (2023-03-22T16:47:36Z)
Convolutional Generation of Textured 3D Meshes [34.20939983046376]
We propose a framework that can generate triangle meshes and associated high-resolution texture maps, using only 2D supervision from single-view natural images. A key contribution of our work is the encoding of the mesh and texture as 2D representations, which are semantically aligned and can be easily modeled by a 2D convolutional GAN. We demonstrate the efficacy of our method on Pascal3D+ Cars and CUB, both in an unconditional setting and in settings where the model is conditioned on class labels, attributes, and text.
arXiv Detail & Related papers (2020-06-13T15:23:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.