Chaining text-to-image and large language model: A novel approach for generating personalized e-commerce banners
- URL: http://arxiv.org/abs/2403.05578v1
- Date: Wed, 28 Feb 2024 07:56:04 GMT
- Title: Chaining text-to-image and large language model: A novel approach for generating personalized e-commerce banners
- Authors: Shanu Vashishtha, Abhinav Prakash, Lalitesh Morishetti, Kaushiki Nag, Yokila Arora, Sushant Kumar, Kannan Achan,
- Abstract summary: We demonstrate the use of text-to-image models for generating personalized web banners for online shoppers.
The novelty in this approach lies in converting users' interaction data to meaningful prompts without human intervention.
Our results show that the proposed approach can create high-quality personalized banners for users.
- Score: 8.508453886143677
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image models such as stable diffusion have opened a plethora of opportunities for generating art. Recent literature has surveyed the use of text-to-image models for enhancing the work of many creative artists. Many e-commerce platforms employ a manual process to generate the banners, which is time-consuming and has limitations of scalability. In this work, we demonstrate the use of text-to-image models for generating personalized web banners with dynamic content for online shoppers based on their interactions. The novelty in this approach lies in converting users' interaction data to meaningful prompts without human intervention. To this end, we utilize a large language model (LLM) to systematically extract a tuple of attributes from item meta-information. The attributes are then passed to a text-to-image model via prompt engineering to generate images for the banner. Our results show that the proposed approach can create high-quality personalized banners for users.
Related papers
- Conditional Text-to-Image Generation with Reference Guidance [81.99538302576302]
This paper explores using additional conditions of an image that provides visual guidance of the particular subjects for diffusion models to generate.
We develop several small-scale expert plugins that efficiently endow a Stable Diffusion model with the capability to take different references.
Our expert plugins demonstrate superior results than the existing methods on all tasks, each containing only 28.55M trainable parameters.
arXiv Detail & Related papers (2024-11-22T21:38:51Z) - ARMADA: Attribute-Based Multimodal Data Augmentation [93.05614922383822]
Attribute-based Multimodal Data Augmentation (ARMADA) is a novel multimodal data augmentation method via knowledge-guided manipulation of visual attributes.
ARMADA is a novel multimodal data generation framework that: (i) extracts knowledge-grounded attributes from symbolic KBs for semantically consistent yet distinctive image-text pair generation.
This also highlights the need to leverage external knowledge proxies for enhanced interpretability and real-world grounding.
arXiv Detail & Related papers (2024-08-19T15:27:25Z) - MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation [22.69019130782004]
We present MoMA: an open-vocabulary, training-free personalized image model that boasts flexible zero-shot capabilities.
We train MoMA to serve a dual role as both a feature extractor and a generator.
We introduce a novel self-attention shortcut method that efficiently transfers image features to an image diffusion model.
arXiv Detail & Related papers (2024-04-08T16:55:49Z) - Prompt Expansion for Adaptive Text-to-Image Generation [51.67811570987088]
This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort.
The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts.
We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods.
arXiv Detail & Related papers (2023-12-27T21:12:21Z) - User-Aware Prefix-Tuning is a Good Learner for Personalized Image
Captioning [35.211749514733846]
Traditional image captioning methods often overlook the preferences and characteristics of users.
Most existing methods emphasize the user context fusion process by memory networks or transformers.
We propose a novel personalized image captioning framework that leverages user context to consider personality factors.
arXiv Detail & Related papers (2023-12-08T02:08:00Z) - Generating Illustrated Instructions [41.613203340244155]
We introduce the new task of generating Illustrated Instructions, i.e., visual instructions customized to a user's needs.
We combine the power of large language models (LLMs) together with strong text-to-image generation diffusion models to propose a simple approach called StackedDiffusion.
arXiv Detail & Related papers (2023-12-07T18:59:20Z) - ITI-GEN: Inclusive Text-to-Image Generation [56.72212367905351]
This study investigates inclusive text-to-image generative models that generate images based on human-written prompts.
We show that, for some attributes, images can represent concepts more expressively than text.
We propose a novel approach, ITI-GEN, that leverages readily available reference images for Inclusive Text-to-Image GENeration.
arXiv Detail & Related papers (2023-09-11T15:54:30Z) - Generating Images with Multimodal Language Models [78.6660334861137]
We propose a method to fuse frozen text-only large language models with pre-trained image encoder and decoder models.
Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue.
arXiv Detail & Related papers (2023-05-26T19:22:03Z) - Grounding Language Models to Images for Multimodal Inputs and Outputs [89.30027812161686]
We propose an efficient method to ground pretrained text-only language models to the visual domain.
We process arbitrarily interleaved image-and-text data, and generate text interleaved with retrieved images.
arXiv Detail & Related papers (2023-01-31T18:33:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.