Personalized and Sequential Text-to-Image Generation
- URL: http://arxiv.org/abs/2412.10419v1
- Date: Tue, 10 Dec 2024 01:47:40 GMT
- Title: Personalized and Sequential Text-to-Image Generation
- Authors: Ofir Nabati, Guy Tennenholtz, ChihWei Hsu, Moonkyung Ryu, Deepak Ramachandran, Yinlam Chow, Xiang Li, Craig Boutilier,
- Abstract summary: We create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets.
We construct user-preference and user-choice models using an EM strategy and identify varying user preference types.
We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest a personalized and diverse slate of prompt expansions to the user.
- Score: 24.787970969428976
- License:
- Abstract: We address the problem of personalized, interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varying user preference types. We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest a personalized and diverse slate of prompt expansions to the user. Our Personalized And Sequential Text-to-image Agent (PASTA) extends T2I models with personalized multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user's intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also release our sequential rater dataset and simulated user-rater interactions to support future research in personalized, multi-turn T2I generation.
Related papers
- Personalized Preference Fine-tuning of Diffusion Models [75.22218338096316]
We introduce PPD, a multi-reward optimization objective that aligns diffusion models with personalized preferences.
With PPD, a diffusion model learns the individual preferences of a population of users in a few-shot way.
Our approach achieves an average win rate of 76% over Stable Cascade, generating images that more accurately reflect specific user preferences.
arXiv Detail & Related papers (2025-01-11T22:38:41Z) - Multifaceted User Modeling in Recommendation: A Federated Foundation Models Approach [28.721903315405353]
Multifaceted user modeling aims to uncover fine-grained patterns and learn representations from user data.
Recent studies on foundation model-based recommendation have emphasized the Transformer architecture's remarkable ability to capture complex, non-linear user-item interaction relationships.
We propose a novel Transformer layer designed specifically for recommendation, using the self-attention mechanism to capture sequential user-item interaction patterns.
arXiv Detail & Related papers (2024-12-22T11:00:00Z) - Scalable Ranked Preference Optimization for Text-to-Image Generation [76.16285931871948]
We investigate a scalable approach for collecting large-scale and fully synthetic datasets for DPO training.
The preferences for paired images are generated using a pre-trained reward function, eliminating the need for involving humans in the annotation process.
We introduce RankDPO to enhance DPO-based methods using the ranking feedback.
arXiv Detail & Related papers (2024-10-23T16:42:56Z) - Doing Personal LAPS: LLM-Augmented Dialogue Construction for Personalized Multi-Session Conversational Search [9.243535345193711]
Our method uses large language models to guide a single human worker in generating personalized dialogues.
LAPS can collect large-scale, human-written, multi-session, and multi-domain conversations.
Our results show that responses generated explicitly using extracted preferences better match user's actual preferences.
arXiv Detail & Related papers (2024-05-06T13:53:03Z) - PMG : Personalized Multimodal Generation with Large Language Models [20.778869086174137]
This paper proposes the first method for personalized multimodal generation using large language models (LLMs)
It showcases its applications and validates its performance via an extensive experimental study on two datasets.
PMG has a significant improvement on personalization for up to 8% in terms of LPIPS while retaining the accuracy of generation.
arXiv Detail & Related papers (2024-04-07T03:05:57Z) - Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond [87.1712108247199]
Our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP)
We develop a generic and personalization generative framework, that can handle a wide range of personalized needs.
Our methodology enhances the capabilities of foundational language models for personalized tasks.
arXiv Detail & Related papers (2024-03-15T20:21:31Z) - User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning [35.211749514733846]
Traditional image captioning methods often overlook the preferences and characteristics of users.
Most existing methods emphasize the user context fusion process by memory networks or transformers.
We propose a novel personalized image captioning framework that leverages user context to consider personality factors.
arXiv Detail & Related papers (2023-12-08T02:08:00Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Modeling Dynamic User Preference via Dictionary Learning for Sequential
Recommendation [133.8758914874593]
Capturing the dynamics in user preference is crucial to better predict user future behaviors because user preferences often drift over time.
Many existing recommendation algorithms -- including both shallow and deep ones -- often model such dynamics independently.
This paper considers the problem of embedding a user's sequential behavior into the latent space of user preferences.
arXiv Detail & Related papers (2022-04-02T03:23:46Z) - Sequence Adaptation via Reinforcement Learning in Recommender Systems [8.909115457491522]
We propose the SAR model, which learns the sequential patterns and adjusts the sequence length of user-item interactions in a personalized manner.
In addition, we optimize a joint loss function to align the accuracy of the sequential recommendations with the expected cumulative rewards of the critic network.
Our experimental evaluation on four real-world datasets demonstrates the superiority of our proposed model over several baseline approaches.
arXiv Detail & Related papers (2021-07-31T13:56:46Z) - Examination and Extension of Strategies for Improving Personalized
Language Modeling via Interpolation [59.35932511895986]
We demonstrate improvements in offline metrics at the user level by interpolating a global LSTM-based authoring model with a user-personalized n-gram model.
We observe that over 80% of users receive a lift in perplexity, with an average of 5.2% in perplexity lift per user.
arXiv Detail & Related papers (2020-06-09T19:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.