DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual
Design
- URL: http://arxiv.org/abs/2310.15144v1
- Date: Mon, 23 Oct 2023 17:48:38 GMT
- Title: DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual
Design
- Authors: Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Lijuan Wang
- Abstract summary: We introduce DEsignBench, a text-to-image (T2I) generation benchmark tailored for visual design scenarios.
For DEsignBench benchmarking, we perform human evaluations on generated images against the criteria of image-text alignment, visual aesthetic, and design creativity.
In addition to human evaluations, we introduce the first automatic image generation evaluator powered by GPT-4V.
- Score: 124.56730013968543
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce DEsignBench, a text-to-image (T2I) generation benchmark tailored
for visual design scenarios. Recent T2I models like DALL-E 3 and others, have
demonstrated remarkable capabilities in generating photorealistic images that
align closely with textual inputs. While the allure of creating visually
captivating images is undeniable, our emphasis extends beyond mere aesthetic
pleasure. We aim to investigate the potential of using these powerful models in
authentic design contexts. In pursuit of this goal, we develop DEsignBench,
which incorporates test samples designed to assess T2I models on both "design
technical capability" and "design application scenario." Each of these two
dimensions is supported by a diverse set of specific design categories. We
explore DALL-E 3 together with other leading T2I models on DEsignBench,
resulting in a comprehensive visual gallery for side-by-side comparisons. For
DEsignBench benchmarking, we perform human evaluations on generated images in
DEsignBench gallery, against the criteria of image-text alignment, visual
aesthetic, and design creativity. Our evaluation also considers other
specialized design capabilities, including text rendering, layout composition,
color harmony, 3D design, and medium style. In addition to human evaluations,
we introduce the first automatic image generation evaluator powered by GPT-4V.
This evaluator provides ratings that align well with human judgments, while
being easily replicable and cost-efficient. A high-resolution version is
available at
https://github.com/design-bench/design-bench.github.io/raw/main/designbench.pdf?download=
Related papers
- PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models [50.33699462106502]
We introduce PhyBench, a comprehensive T2I evaluation dataset comprising 700 prompts across 4 primary categories: mechanics, optics, thermodynamics, and material properties.
We assess 6 prominent T2I models, including proprietary models DALLE3 and Gemini, and demonstrate that incorporating physical principles into prompts enhances the models' ability to generate physically accurate images.
Our findings reveal that: (1) even advanced models frequently err in various physical scenarios, except for optics; (2) GPT-4o, with item-specific scoring instructions, effectively evaluates the models' understanding of physical commonsense; and (3) current T2I models are
arXiv Detail & Related papers (2024-06-17T17:49:01Z) - I-Design: Personalized LLM Interior Designer [57.00412237555167]
I-Design is a personalized interior designer that allows users to generate and visualize their design goals through natural language communication.
I-Design starts with a team of large language model agents that engage in dialogues and logical reasoning with one another.
The final design is then constructed in 3D by retrieving and integrating assets from an existing object database.
arXiv Detail & Related papers (2024-04-03T16:17:53Z) - Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic
Image Design and Generation [121.42924593374127]
We introduce Idea to Image'', a system that enables multimodal iterative self-refinement with GPT-4V(ision) for automatic image design and generation.
We investigate if systems based on large multimodal models (LMMs) can develop analogous multimodal self-refinement abilities.
arXiv Detail & Related papers (2023-10-12T17:34:20Z) - Visual Programming for Text-to-Image Generation and Evaluation [73.12069620086311]
We propose two novel interpretable/explainable visual programming frameworks for text-to-image (T2I) generation and evaluation.
First, we introduce VPGen, an interpretable step-by-step T2I generation framework that decomposes T2I generation into three steps: object/count generation, layout generation, and image generation.
Second, we introduce VPEval, an interpretable and explainable evaluation framework for T2I generation based on visual programming.
arXiv Detail & Related papers (2023-05-24T16:42:17Z) - HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image
Models [39.38477117444303]
HRS-Bench is an evaluation benchmark for Text-to-Image (T2I) models.
It measures 13 skills that can be categorized into five major categories: accuracy, robustness, generalization, fairness, and bias.
It covers 50 scenarios, including fashion, animals, transportation, food, and clothes.
arXiv Detail & Related papers (2023-04-11T17:59:13Z) - Evaluation of Sketch-Based and Semantic-Based Modalities for Mockup
Generation [15.838427479984926]
Design mockups are essential instruments for visualizing and testing design ideas.
We present and evaluate two different modalities for generating mockups based on hand-drawn sketches.
Our results show that sketch-based generation was more intuitive and expressive, while semantic-based generative AI obtained better results in terms of quality and fidelity.
arXiv Detail & Related papers (2023-03-22T16:47:36Z) - Hand-Object Interaction Image Generation [135.87707468156057]
This work is dedicated to a new task, i.e., hand-object interaction image generation.
It aims to conditionally generate the hand-object image under the given hand, object and their interaction status.
This task is challenging and research-worthy in many potential application scenarios, such as AR/VR games and online shopping.
arXiv Detail & Related papers (2022-11-28T18:59:57Z) - Convolutional Generation of Textured 3D Meshes [34.20939983046376]
We propose a framework that can generate triangle meshes and associated high-resolution texture maps, using only 2D supervision from single-view natural images.
A key contribution of our work is the encoding of the mesh and texture as 2D representations, which are semantically aligned and can be easily modeled by a 2D convolutional GAN.
We demonstrate the efficacy of our method on Pascal3D+ Cars and CUB, both in an unconditional setting and in settings where the model is conditioned on class labels, attributes, and text.
arXiv Detail & Related papers (2020-06-13T15:23:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.