Do It Yourself (DIY): Modifying Images for Poems in a Zero-Shot Setting Using Weighted Prompt Manipulation
- URL: http://arxiv.org/abs/2509.11878v1
- Date: Mon, 15 Sep 2025 12:58:38 GMT
- Title: Do It Yourself (DIY): Modifying Images for Poems in a Zero-Shot Setting Using Weighted Prompt Manipulation
- Authors: Sofia Jamil, Kotla Sai Charan, Sriparna Saha, Koustava Goswami, K J Joseph,
- Abstract summary: We introduce a novel Weighted Prompt Manipulation (WPM) technique, which modifies attention weights and text embeddings within diffusion models.<n>WPM enhances or suppresses their influence in the final generated image, leading to semantically richer and more contextually accurate visualizations.<n>This is the first attempt at integrating weighted prompt manipulation for enhancing imagery in poetic language.
- Score: 20.357558748582942
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Poetry is an expressive form of art that invites multiple interpretations, as readers often bring their own emotions, experiences, and cultural backgrounds into their understanding of a poem. Recognizing this, we aim to generate images for poems and improve these images in a zero-shot setting, enabling audiences to modify images as per their requirements. To achieve this, we introduce a novel Weighted Prompt Manipulation (WPM) technique, which systematically modifies attention weights and text embeddings within diffusion models. By dynamically adjusting the importance of specific words, WPM enhances or suppresses their influence in the final generated image, leading to semantically richer and more contextually accurate visualizations. Our approach exploits diffusion models and large language models (LLMs) such as GPT in conjunction with existing poetry datasets, ensuring a comprehensive and structured methodology for improved image generation in the literary domain. To the best of our knowledge, this is the first attempt at integrating weighted prompt manipulation for enhancing imagery in poetic language.
Related papers
- PoemTale Diffusion: Minimising Information Loss in Poem to Image Generation with Multi-Stage Prompt Refinement [18.293592213622183]
PoemTale Diffusion aims to minimise the information that is lost during poetic text-to-image conversion.<n>To support this, we adapt existing state-of-the-art diffusion models by modifying their self-attention mechanisms.<n>To encourage research in the field of poetry, we introduce the P4I dataset, consisting of 1111 poems.
arXiv Detail & Related papers (2025-07-18T07:33:08Z) - RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning [88.14234949860105]
RePrompt is a novel reprompting framework that introduces explicit reasoning into the prompt enhancement process via reinforcement learning.<n>Our approach enables end-to-end training without human-annotated data.
arXiv Detail & Related papers (2025-05-23T06:44:26Z) - Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models [18.293592213622183]
We propose a PoemToPixel framework designed to generate images that visually represent the inherent meanings of poems.<n>Our approach incorporates the concept of prompt tuning in our image generation framework to ensure that the resulting images closely align with the poetic content.<n>To expand the diversity of the poetry dataset, we introduce MiniPo, a novel multimodal dataset comprising 1001 children's poems and images.
arXiv Detail & Related papers (2025-01-10T10:26:54Z) - ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models and Large Language Models [52.23899502520261]
We introduce a novel framework named, ARTIST, which incorporates a dedicated textual diffusion model to focus on the learning of text structures specifically.<n>We finetune a visual diffusion model, enabling it to assimilate textual structure information from the pretrained textual model.<n>This disentangled architecture design and training strategy significantly enhance the text rendering ability of the diffusion models for text-rich image generation.
arXiv Detail & Related papers (2024-06-17T19:31:24Z) - Poetry2Image: An Iterative Correction Framework for Images Generated from Chinese Classical Poetry [7.536700229966157]
Poetry2Image is an iterative correction framework for images generated from Chinese classical poetry.
The proposed method achieves an average element completeness of 70.63%, representing an improvement of 25.56% over direct image generation.
arXiv Detail & Related papers (2024-06-15T19:45:08Z) - Text Guided Image Editing with Automatic Concept Locating and Forgetting [27.70615803908037]
We propose a novel method called Locate and Forget (LaF) to locate potential target concepts in the image for modification.
Compared to the baselines, our method demonstrates its superiority in text-guided image editing tasks both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-05-30T05:36:32Z) - Dynamic Prompt Optimizing for Text-to-Image Generation [63.775458908172176]
We introduce the textbfPrompt textbfAuto-textbfEditing (PAE) method to improve text-to-image generative models.
We employ an online reinforcement learning strategy to explore the weights and injection time steps of each word, leading to the dynamic fine-control prompts.
arXiv Detail & Related papers (2024-04-05T13:44:39Z) - Impressions: Understanding Visual Semiotics and Aesthetic Impact [66.40617566253404]
We present Impressions, a novel dataset through which to investigate the semiotics of images.
We show that existing multimodal image captioning and conditional generation models struggle to simulate plausible human responses to images.
This dataset significantly improves their ability to model impressions and aesthetic evaluations of images through fine-tuning and few-shot adaptation.
arXiv Detail & Related papers (2023-10-27T04:30:18Z) - SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
Large Language Models [56.88192537044364]
We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models.
Our approach can make text-to-image diffusion models easier to use with better user experience.
arXiv Detail & Related papers (2023-05-09T05:48:38Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - Language Does More Than Describe: On The Lack Of Figurative Speech in
Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts.
These models have been trained using text data collected from content-based labelling protocols.
We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z) - Generating Chinese Poetry from Images via Concrete and Abstract
Information [23.690384629376005]
We propose an infilling-based Chinese poetry generation model which can infill the Concrete keywords into each line of poems in an explicit way.
We also use non-parallel data during training and construct separate image datasets and poem datasets to train the different components in our framework.
Both automatic and human evaluation results show that our approach can generate poems which have better consistency with images without losing the quality.
arXiv Detail & Related papers (2020-03-24T11:17:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.