Twin Co-Adaptive Dialogue for Progressive Image Generation
- URL: http://arxiv.org/abs/2504.14868v1
- Date: Mon, 21 Apr 2025 05:37:07 GMT
- Title: Twin Co-Adaptive Dialogue for Progressive Image Generation
- Authors: Jianhui Wang, Yangfan He, Yan Zhong, Xinyuan Song, Jiayi Su, Yuheng Feng, Hongyang He, Wenyu Zhu, Xinhang Yuan, Kuan Lu, Menghao Huo, Miao Zhang, Keqin Li, Jiaqi Chen, Tianyu Shi, Xueqian Wang,
- Abstract summary: We present Twin-Co, a framework that leverages synchronized, co-adaptive dialogue to progressively refine image generation.<n>Experiments demonstrate that Twin-Co not only enhances user experience by reducing trial-and-error but also improves the quality of the generated images.
- Score: 26.175824150331987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern text-to-image generation systems have enabled the creation of remarkably realistic and high-quality visuals, yet they often falter when handling the inherent ambiguities in user prompts. In this work, we present Twin-Co, a framework that leverages synchronized, co-adaptive dialogue to progressively refine image generation. Instead of a static generation process, Twin-Co employs a dynamic, iterative workflow where an intelligent dialogue agent continuously interacts with the user. Initially, a base image is generated from the user's prompt. Then, through a series of synchronized dialogue exchanges, the system adapts and optimizes the image according to evolving user feedback. The co-adaptive process allows the system to progressively narrow down ambiguities and better align with user intent. Experiments demonstrate that Twin-Co not only enhances user experience by reducing trial-and-error iterations but also improves the quality of the generated images, streamlining the creative process across various applications.
Related papers
- Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference Understanding [29.191627597682597]
We present a framework incorporating human-in-the-loop feedback, leveraging a well-trained reward model aligned with user preferences.
Our approach consistently surpasses competing models in user satisfaction, especially in multi-turn dialogue scenarios.
arXiv Detail & Related papers (2025-04-25T09:35:02Z) - Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting [71.29100512700064]
We present T-Prompter, a training-free method for theme-specific image generation.<n>T-Prompter integrates reference images into generative models, allowing users to seamlessly specify the target theme.<n>Our approach enables consistent story generation, character design, realistic character generation, and style-guided image generation.
arXiv Detail & Related papers (2025-01-26T19:01:19Z) - Enhancing Intent Understanding for Ambiguous prompt: A Human-Machine Co-Adaption Strategy [28.647935556492957]
We propose a human-machine co-adaption strategy using mutual information between the user's prompts and the pictures under modification.<n>We find that an improved model can reduce the necessity for multiple rounds of adjustments.
arXiv Detail & Related papers (2025-01-25T10:32:00Z) - Reflective Human-Machine Co-adaptation for Enhanced Text-to-Image Generation Dialogue System [7.009995656535664]
We propose a reflective human-machine co-adaptation strategy, named RHM-CAS.
externally, the Agent engages in meaningful language interactions with users to reflect on and refine the generated images.
Internally, the Agent tries to optimize the policy based on user preferences, ensuring that the final outcomes closely align with user preferences.
arXiv Detail & Related papers (2024-08-27T18:08:00Z) - Prompt Refinement with Image Pivot for Text-to-Image Generation [103.63292948223592]
We introduce Prompt Refinement with Image Pivot (PRIP) for text-to-image generation.
PRIP decomposes refinement process into two data-rich tasks: inferring representations of user-preferred images from user languages and translating image representations into system languages.
It substantially outperforms a wide range of baselines and effectively transfers to unseen systems in a zero-shot manner.
arXiv Detail & Related papers (2024-06-28T22:19:24Z) - Training-Free Consistent Text-to-Image Generation [80.4814768762066]
Text-to-image models can portray the same subject across diverse prompts.
Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects.
We present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model.
arXiv Detail & Related papers (2024-02-05T18:42:34Z) - NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation [4.21512101973222]
NeuroPrompts is an adaptive framework that enhances a user's prompt to improve the quality of generations produced by text-to-image models.
Our framework utilizes constrained text decoding with a pre-trained language model that has been adapted to generate prompts similar to those produced by human prompt engineers.
arXiv Detail & Related papers (2023-11-20T22:57:47Z) - Teaching Text-to-Image Models to Communicate in Dialog [44.76942024105259]
In this paper, we focus on the innovative dialog-to-image generation task.
To tackle this problem, we design a tailored fine-tuning approach on the top of state-of-the-art text-to-image generation models.
Our approach brings consistent and remarkable improvement with 3 state-of-the-art pre-trained text-to-image generation backbones.
arXiv Detail & Related papers (2023-09-27T09:33:16Z) - IR-GAN: Image Manipulation with Linguistic Instruction by Increment
Reasoning [110.7118381246156]
Increment Reasoning Generative Adversarial Network (IR-GAN) aims to reason consistency between visual increment in images and semantic increment in instructions.
First, we introduce the word-level and instruction-level instruction encoders to learn user's intention from history-correlated instructions as semantic increment.
Second, we embed the representation of semantic increment into that of source image for generating target image, where source image plays the role of referring auxiliary.
arXiv Detail & Related papers (2022-04-02T07:48:39Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.