DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual
Design
- URL: http://arxiv.org/abs/2310.15144v1
- Date: Mon, 23 Oct 2023 17:48:38 GMT
- Title: DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual
Design
- Authors: Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Lijuan Wang
- Abstract summary: We introduce DEsignBench, a text-to-image (T2I) generation benchmark tailored for visual design scenarios.
For DEsignBench benchmarking, we perform human evaluations on generated images against the criteria of image-text alignment, visual aesthetic, and design creativity.
In addition to human evaluations, we introduce the first automatic image generation evaluator powered by GPT-4V.
- Score: 124.56730013968543
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce DEsignBench, a text-to-image (T2I) generation benchmark tailored
for visual design scenarios. Recent T2I models like DALL-E 3 and others, have
demonstrated remarkable capabilities in generating photorealistic images that
align closely with textual inputs. While the allure of creating visually
captivating images is undeniable, our emphasis extends beyond mere aesthetic
pleasure. We aim to investigate the potential of using these powerful models in
authentic design contexts. In pursuit of this goal, we develop DEsignBench,
which incorporates test samples designed to assess T2I models on both "design
technical capability" and "design application scenario." Each of these two
dimensions is supported by a diverse set of specific design categories. We
explore DALL-E 3 together with other leading T2I models on DEsignBench,
resulting in a comprehensive visual gallery for side-by-side comparisons. For
DEsignBench benchmarking, we perform human evaluations on generated images in
DEsignBench gallery, against the criteria of image-text alignment, visual
aesthetic, and design creativity. Our evaluation also considers other
specialized design capabilities, including text rendering, layout composition,
color harmony, 3D design, and medium style. In addition to human evaluations,
we introduce the first automatic image generation evaluator powered by GPT-4V.
This evaluator provides ratings that align well with human judgments, while
being easily replicable and cost-efficient. A high-resolution version is
available at
https://github.com/design-bench/design-bench.github.io/raw/main/designbench.pdf?download=
Related papers
- Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping [55.98643055756135]
We introduce Sketch2Code, a benchmark that evaluates state-of-the-art Vision Language Models (VLMs) on automating the conversion of rudimentary sketches into webpage prototypes.
We analyze ten commercial and open-source models, showing that Sketch2Code is challenging for existing VLMs.
A user study with UI/UX experts reveals a significant preference for proactive question-asking over passive feedback reception.
arXiv Detail & Related papers (2024-10-21T17:39:49Z) - KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models.
We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals.
Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z) - ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty [52.15933752463479]
ConceptMix is a scalable, controllable, and customizable benchmark.
It automatically evaluates compositional generation ability of Text-to-Image (T2I) models.
It reveals that the performance of several models, especially open models, drops dramatically with increased k.
arXiv Detail & Related papers (2024-08-26T15:08:12Z) - PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models [50.33699462106502]
Text-to-image (T2I) models frequently fail to produce images consistent with physical commonsense.
Current T2I evaluation benchmarks focus on metrics such as accuracy, bias, and safety, neglecting the evaluation of models' internal knowledge.
We introduce PhyBench, a comprehensive T2I evaluation dataset comprising 700 prompts across 4 primary categories: mechanics, optics, thermodynamics, and material properties.
arXiv Detail & Related papers (2024-06-17T17:49:01Z) - I-Design: Personalized LLM Interior Designer [57.00412237555167]
I-Design is a personalized interior designer that allows users to generate and visualize their design goals through natural language communication.
I-Design starts with a team of large language model agents that engage in dialogues and logical reasoning with one another.
The final design is then constructed in 3D by retrieving and integrating assets from an existing object database.
arXiv Detail & Related papers (2024-04-03T16:17:53Z) - HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image
Models [39.38477117444303]
HRS-Bench is an evaluation benchmark for Text-to-Image (T2I) models.
It measures 13 skills that can be categorized into five major categories: accuracy, robustness, generalization, fairness, and bias.
It covers 50 scenarios, including fashion, animals, transportation, food, and clothes.
arXiv Detail & Related papers (2023-04-11T17:59:13Z) - Evaluation of Sketch-Based and Semantic-Based Modalities for Mockup
Generation [15.838427479984926]
Design mockups are essential instruments for visualizing and testing design ideas.
We present and evaluate two different modalities for generating mockups based on hand-drawn sketches.
Our results show that sketch-based generation was more intuitive and expressive, while semantic-based generative AI obtained better results in terms of quality and fidelity.
arXiv Detail & Related papers (2023-03-22T16:47:36Z) - Convolutional Generation of Textured 3D Meshes [34.20939983046376]
We propose a framework that can generate triangle meshes and associated high-resolution texture maps, using only 2D supervision from single-view natural images.
A key contribution of our work is the encoding of the mesh and texture as 2D representations, which are semantically aligned and can be easily modeled by a 2D convolutional GAN.
We demonstrate the efficacy of our method on Pascal3D+ Cars and CUB, both in an unconditional setting and in settings where the model is conditioned on class labels, attributes, and text.
arXiv Detail & Related papers (2020-06-13T15:23:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.