Gaud\'i: Conversational Interactions with Deep Representations to
Generate Image Collections
- URL: http://arxiv.org/abs/2112.04404v1
- Date: Sun, 5 Dec 2021 07:02:33 GMT
- Title: Gaud\'i: Conversational Interactions with Deep Representations to
Generate Image Collections
- Authors: Victor S. Bursztyn, Jennifer Healey, Vishwa Vinay
- Abstract summary: Gaud'i was developed to help designers search for inspirational images using natural language.
Ours is the first attempt to represent mood-boards as the stories that designers tell when presenting a creative direction to a client.
- Score: 14.012745542766506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Based on recent advances in realistic language modeling (GPT-3) and
cross-modal representations (CLIP), Gaud\'i was developed to help designers
search for inspirational images using natural language. In the early stages of
the design process, with the goal of eliciting a client's preferred creative
direction, designers will typically create thematic collections of
inspirational images called "mood-boards". Creating a mood-board involves
sequential image searches which are currently performed using keywords or
images. Gaud\'i transforms this process into a conversation where the user is
gradually detailing the mood-board's theme. This representation allows our AI
to generate new search queries from scratch, straight from a project briefing,
following a theme hypothesized by GPT-3. Compared to previous computational
approaches to mood-board creation, to the best of our knowledge, ours is the
first attempt to represent mood-boards as the stories that designers tell when
presenting a creative direction to a client.
Related papers
- Influencer: Empowering Everyday Users in Creating Promotional Posts via AI-infused Exploration and Customization [11.9449656506593]
Influen is an interactive tool to assist novice creators in crafting high-quality promotional post designs.
Within Influencer, we contribute a multi-dimensional recommendation framework that allows users to intuitively generate new ideas.
Influential implements a holistic promotional post design system that supports context-aware image and caption exploration.
arXiv Detail & Related papers (2024-07-20T16:27:49Z) - MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis [65.78359025027457]
MetaDesigner revolutionizes artistic typography by leveraging the strengths of Large Language Models (LLMs) to drive a design paradigm centered around user engagement.
A comprehensive feedback mechanism harnesses insights from multimodal models and user evaluations to refine and enhance the design process iteratively.
Empirical validations highlight MetaDesigner's capability to effectively serve diverse WordArt applications, consistently producing aesthetically appealing and context-sensitive results.
arXiv Detail & Related papers (2024-06-28T11:58:26Z) - Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations [109.65267337037842]
We introduce the task of Image Editing Recommendation (IER)
IER aims to automatically generate diverse creative editing instructions from an input image and a simple prompt representing the users' under-specified editing purpose.
We introduce Creativity-Vision Language Assistant(Creativity-VLA), a multimodal framework designed specifically for edit-instruction generation.
arXiv Detail & Related papers (2024-05-31T18:22:29Z) - DiffChat: Learning to Chat with Text-to-Image Synthesis Models for
Interactive Image Creation [40.478839423995296]
We present DiffChat, a novel method to align Large Language Models (LLMs) to "chat" with prompt-as-input Text-to-Image Synthesis (TIS) models for interactive image creation.
Given a raw prompt/image and a user-specified instruction, DiffChat can effectively make appropriate modifications and generate the target prompt.
arXiv Detail & Related papers (2024-03-08T02:24:27Z) - PALP: Prompt Aligned Personalization of Text-to-Image Models [68.91005384187348]
Existing personalization methods compromise personalization ability or the alignment to complex prompts.
We propose a new approach focusing on personalization methods for a emphsingle prompt to address this issue.
Our method excels in improving text alignment, enabling the creation of images with complex and intricate prompts.
arXiv Detail & Related papers (2024-01-11T18:35:33Z) - Teaching Text-to-Image Models to Communicate in Dialog [44.76942024105259]
In this paper, we focus on the innovative dialog-to-image generation task.
To tackle this problem, we design a tailored fine-tuning approach on the top of state-of-the-art text-to-image generation models.
Our approach brings consistent and remarkable improvement with 3 state-of-the-art pre-trained text-to-image generation backbones.
arXiv Detail & Related papers (2023-09-27T09:33:16Z) - SketchDreamer: Interactive Text-Augmented Creative Sketch Ideation [111.2195741547517]
We present a method to generate controlled sketches using a text-conditioned diffusion model trained on pixel representations of images.
Our objective is to empower non-professional users to create sketches and, through a series of optimisation processes, transform a narrative into a storyboard.
arXiv Detail & Related papers (2023-08-27T19:44:44Z) - IR-GAN: Image Manipulation with Linguistic Instruction by Increment
Reasoning [110.7118381246156]
Increment Reasoning Generative Adversarial Network (IR-GAN) aims to reason consistency between visual increment in images and semantic increment in instructions.
First, we introduce the word-level and instruction-level instruction encoders to learn user's intention from history-correlated instructions as semantic increment.
Second, we embed the representation of semantic increment into that of source image for generating target image, where source image plays the role of referring auxiliary.
arXiv Detail & Related papers (2022-04-02T07:48:39Z) - Exploring Latent Dimensions of Crowd-sourced Creativity [0.02294014185517203]
We build our work on the largest AI-based creativity platform, Artbreeder.
We explore the latent dimensions of images generated on this platform and present a novel framework for manipulating images to make them more creative.
arXiv Detail & Related papers (2021-12-13T19:24:52Z) - Learned Spatial Representations for Few-shot Talking-Head Synthesis [68.3787368024951]
We propose a novel approach for few-shot talking-head synthesis.
We show that this disentangled representation leads to a significant improvement over previous methods.
arXiv Detail & Related papers (2021-04-29T17:59:42Z) - Words as Art Materials: Generating Paintings with Sequential GANs [8.249180979158815]
We investigate the generation of artistic images on a large variance dataset.
This dataset includes images with variations, for example, in shape, color, and content.
We propose a sequential Generative Adversarial Network model.
arXiv Detail & Related papers (2020-07-08T19:17:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.