Gaud\'i: Conversational Interactions with Deep Representations to
Generate Image Collections
- URL: http://arxiv.org/abs/2112.04404v1
- Date: Sun, 5 Dec 2021 07:02:33 GMT
- Title: Gaud\'i: Conversational Interactions with Deep Representations to
Generate Image Collections
- Authors: Victor S. Bursztyn, Jennifer Healey, Vishwa Vinay
- Abstract summary: Gaud'i was developed to help designers search for inspirational images using natural language.
Ours is the first attempt to represent mood-boards as the stories that designers tell when presenting a creative direction to a client.
- Score: 14.012745542766506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Based on recent advances in realistic language modeling (GPT-3) and
cross-modal representations (CLIP), Gaud\'i was developed to help designers
search for inspirational images using natural language. In the early stages of
the design process, with the goal of eliciting a client's preferred creative
direction, designers will typically create thematic collections of
inspirational images called "mood-boards". Creating a mood-board involves
sequential image searches which are currently performed using keywords or
images. Gaud\'i transforms this process into a conversation where the user is
gradually detailing the mood-board's theme. This representation allows our AI
to generate new search queries from scratch, straight from a project briefing,
following a theme hypothesized by GPT-3. Compared to previous computational
approaches to mood-board creation, to the best of our knowledge, ours is the
first attempt to represent mood-boards as the stories that designers tell when
presenting a creative direction to a client.
Related papers
- Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting [71.29100512700064]
We present T-Prompter, a training-free method for theme-specific image generation.
T-Prompter integrates reference images into generative models, allowing users to seamlessly specify the target theme.
Our approach enables consistent story generation, character design, realistic character generation, and style-guided image generation.
arXiv Detail & Related papers (2025-01-26T19:01:19Z) - Surrealistic-like Image Generation with Vision-Language Models [4.66729174362509]
In this paper, we explore the generation of images in the style of paintings in the surrealism movement using vision-language generative models.
Our investigation starts with the generation of images under various image generation settings and different models.
We evaluate the performance of selected models and gain valuable insights into their capabilities in generating such images.
arXiv Detail & Related papers (2024-12-18T22:03:26Z) - GPTDrawer: Enhancing Visual Synthesis through ChatGPT [4.79996063469789]
GPTDrawer is an innovative pipeline that leverages the generative prowess of GPT-based models to enhance the visual synthesis process.
Our methodology employs a novel algorithm that iteratively refines input prompts using keyword extraction, semantic analysis, and image-text congruence evaluation.
The results demonstrate a marked improvement in the fidelity of images generated in accordance with user-defined prompts.
arXiv Detail & Related papers (2024-12-11T00:42:44Z) - Influencer: Empowering Everyday Users in Creating Promotional Posts via AI-infused Exploration and Customization [11.9449656506593]
Influen is an interactive tool to assist novice creators in crafting high-quality promotional post designs.
Within Influencer, we contribute a multi-dimensional recommendation framework that allows users to intuitively generate new ideas.
Influential implements a holistic promotional post design system that supports context-aware image and caption exploration.
arXiv Detail & Related papers (2024-07-20T16:27:49Z) - Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations [109.65267337037842]
We introduce the task of Image Editing Recommendation (IER)
IER aims to automatically generate diverse creative editing instructions from an input image and a simple prompt representing the users' under-specified editing purpose.
We introduce Creativity-Vision Language Assistant(Creativity-VLA), a multimodal framework designed specifically for edit-instruction generation.
arXiv Detail & Related papers (2024-05-31T18:22:29Z) - DiffChat: Learning to Chat with Text-to-Image Synthesis Models for
Interactive Image Creation [40.478839423995296]
We present DiffChat, a novel method to align Large Language Models (LLMs) to "chat" with prompt-as-input Text-to-Image Synthesis (TIS) models for interactive image creation.
Given a raw prompt/image and a user-specified instruction, DiffChat can effectively make appropriate modifications and generate the target prompt.
arXiv Detail & Related papers (2024-03-08T02:24:27Z) - PALP: Prompt Aligned Personalization of Text-to-Image Models [68.91005384187348]
Existing personalization methods compromise personalization ability or the alignment to complex prompts.
We propose a new approach focusing on personalization methods for a emphsingle prompt to address this issue.
Our method excels in improving text alignment, enabling the creation of images with complex and intricate prompts.
arXiv Detail & Related papers (2024-01-11T18:35:33Z) - SketchDreamer: Interactive Text-Augmented Creative Sketch Ideation [111.2195741547517]
We present a method to generate controlled sketches using a text-conditioned diffusion model trained on pixel representations of images.
Our objective is to empower non-professional users to create sketches and, through a series of optimisation processes, transform a narrative into a storyboard.
arXiv Detail & Related papers (2023-08-27T19:44:44Z) - IR-GAN: Image Manipulation with Linguistic Instruction by Increment
Reasoning [110.7118381246156]
Increment Reasoning Generative Adversarial Network (IR-GAN) aims to reason consistency between visual increment in images and semantic increment in instructions.
First, we introduce the word-level and instruction-level instruction encoders to learn user's intention from history-correlated instructions as semantic increment.
Second, we embed the representation of semantic increment into that of source image for generating target image, where source image plays the role of referring auxiliary.
arXiv Detail & Related papers (2022-04-02T07:48:39Z) - Learned Spatial Representations for Few-shot Talking-Head Synthesis [68.3787368024951]
We propose a novel approach for few-shot talking-head synthesis.
We show that this disentangled representation leads to a significant improvement over previous methods.
arXiv Detail & Related papers (2021-04-29T17:59:42Z) - Words as Art Materials: Generating Paintings with Sequential GANs [8.249180979158815]
We investigate the generation of artistic images on a large variance dataset.
This dataset includes images with variations, for example, in shape, color, and content.
We propose a sequential Generative Adversarial Network model.
arXiv Detail & Related papers (2020-07-08T19:17:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.