Related papers: Gaud\'i: Conversational Interactions with Deep Representations to Generate Image Collections

Gaud\'i: Conversational Interactions with Deep Representations to Generate Image Collections

URL: http://arxiv.org/abs/2112.04404v1
Date: Sun, 5 Dec 2021 07:02:33 GMT
Title: Gaud\'i: Conversational Interactions with Deep Representations to Generate Image Collections
Authors: Victor S. Bursztyn, Jennifer Healey, Vishwa Vinay
Abstract summary: Gaud'i was developed to help designers search for inspirational images using natural language. Ours is the first attempt to represent mood-boards as the stories that designers tell when presenting a creative direction to a client.
Score: 14.012745542766506
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Based on recent advances in realistic language modeling (GPT-3) and cross-modal representations (CLIP), Gaud\'i was developed to help designers search for inspirational images using natural language. In the early stages of the design process, with the goal of eliciting a client's preferred creative direction, designers will typically create thematic collections of inspirational images called "mood-boards". Creating a mood-board involves sequential image searches which are currently performed using keywords or images. Gaud\'i transforms this process into a conversation where the user is gradually detailing the mood-board's theme. This representation allows our AI to generate new search queries from scratch, straight from a project briefing, following a theme hypothesized by GPT-3. Compared to previous computational approaches to mood-board creation, to the best of our knowledge, ours is the first attempt to represent mood-boards as the stories that designers tell when presenting a creative direction to a client.

Related papers

Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting [71.29100512700064]
We present T-Prompter, a training-free method for theme-specific image generation. T-Prompter integrates reference images into generative models, allowing users to seamlessly specify the target theme. Our approach enables consistent story generation, character design, realistic character generation, and style-guided image generation.
arXiv Detail & Related papers (2025-01-26T19:01:19Z)
Surrealistic-like Image Generation with Vision-Language Models [4.66729174362509]
In this paper, we explore the generation of images in the style of paintings in the surrealism movement using vision-language generative models. Our investigation starts with the generation of images under various image generation settings and different models. We evaluate the performance of selected models and gain valuable insights into their capabilities in generating such images.
arXiv Detail & Related papers (2024-12-18T22:03:26Z)
GPTDrawer: Enhancing Visual Synthesis through ChatGPT [4.79996063469789]
GPTDrawer is an innovative pipeline that leverages the generative prowess of GPT-based models to enhance the visual synthesis process. Our methodology employs a novel algorithm that iteratively refines input prompts using keyword extraction, semantic analysis, and image-text congruence evaluation. The results demonstrate a marked improvement in the fidelity of images generated in accordance with user-defined prompts.
arXiv Detail & Related papers (2024-12-11T00:42:44Z)
Influencer: Empowering Everyday Users in Creating Promotional Posts via AI-infused Exploration and Customization [11.9449656506593]
Influen is an interactive tool to assist novice creators in crafting high-quality promotional post designs. Within Influencer, we contribute a multi-dimensional recommendation framework that allows users to intuitively generate new ideas. Influential implements a holistic promotional post design system that supports context-aware image and caption exploration.
arXiv Detail & Related papers (2024-07-20T16:27:49Z)
MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis [65.78359025027457]
MetaDesigner revolutionizes artistic typography by leveraging the strengths of Large Language Models (LLMs) to drive a design paradigm centered around user engagement. A comprehensive feedback mechanism harnesses insights from multimodal models and user evaluations to refine and enhance the design process iteratively. Empirical validations highlight MetaDesigner's capability to effectively serve diverse WordArt applications, consistently producing aesthetically appealing and context-sensitive results.
arXiv Detail & Related papers (2024-06-28T11:58:26Z)
Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations [109.65267337037842]
We introduce the task of Image Editing Recommendation (IER) IER aims to automatically generate diverse creative editing instructions from an input image and a simple prompt representing the users' under-specified editing purpose. We introduce Creativity-Vision Language Assistant(Creativity-VLA), a multimodal framework designed specifically for edit-instruction generation.
arXiv Detail & Related papers (2024-05-31T18:22:29Z)
DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation [40.478839423995296]
We present DiffChat, a novel method to align Large Language Models (LLMs) to "chat" with prompt-as-input Text-to-Image Synthesis (TIS) models for interactive image creation. Given a raw prompt/image and a user-specified instruction, DiffChat can effectively make appropriate modifications and generate the target prompt.
arXiv Detail & Related papers (2024-03-08T02:24:27Z)
PALP: Prompt Aligned Personalization of Text-to-Image Models [68.91005384187348]
Existing personalization methods compromise personalization ability or the alignment to complex prompts. We propose a new approach focusing on personalization methods for a emphsingle prompt to address this issue. Our method excels in improving text alignment, enabling the creation of images with complex and intricate prompts.
arXiv Detail & Related papers (2024-01-11T18:35:33Z)
Teaching Text-to-Image Models to Communicate in Dialog [44.76942024105259]
In this paper, we focus on the innovative dialog-to-image generation task. To tackle this problem, we design a tailored fine-tuning approach on the top of state-of-the-art text-to-image generation models. Our approach brings consistent and remarkable improvement with 3 state-of-the-art pre-trained text-to-image generation backbones.
arXiv Detail & Related papers (2023-09-27T09:33:16Z)
SketchDreamer: Interactive Text-Augmented Creative Sketch Ideation [111.2195741547517]
We present a method to generate controlled sketches using a text-conditioned diffusion model trained on pixel representations of images. Our objective is to empower non-professional users to create sketches and, through a series of optimisation processes, transform a narrative into a storyboard.
arXiv Detail & Related papers (2023-08-27T19:44:44Z)
IR-GAN: Image Manipulation with Linguistic Instruction by Increment Reasoning [110.7118381246156]
Increment Reasoning Generative Adversarial Network (IR-GAN) aims to reason consistency between visual increment in images and semantic increment in instructions. First, we introduce the word-level and instruction-level instruction encoders to learn user's intention from history-correlated instructions as semantic increment. Second, we embed the representation of semantic increment into that of source image for generating target image, where source image plays the role of referring auxiliary.
arXiv Detail & Related papers (2022-04-02T07:48:39Z)
Exploring Latent Dimensions of Crowd-sourced Creativity [0.02294014185517203]
We build our work on the largest AI-based creativity platform, Artbreeder. We explore the latent dimensions of images generated on this platform and present a novel framework for manipulating images to make them more creative.
arXiv Detail & Related papers (2021-12-13T19:24:52Z)
Learned Spatial Representations for Few-shot Talking-Head Synthesis [68.3787368024951]
We propose a novel approach for few-shot talking-head synthesis. We show that this disentangled representation leads to a significant improvement over previous methods.
arXiv Detail & Related papers (2021-04-29T17:59:42Z)
Words as Art Materials: Generating Paintings with Sequential GANs [8.249180979158815]
We investigate the generation of artistic images on a large variance dataset. This dataset includes images with variations, for example, in shape, color, and content. We propose a sequential Generative Adversarial Network model.
arXiv Detail & Related papers (2020-07-08T19:17:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.