Poetry2Image: An Iterative Correction Framework for Images Generated from Chinese Classical Poetry
- URL: http://arxiv.org/abs/2407.06196v1
- Date: Sat, 15 Jun 2024 19:45:08 GMT
- Title: Poetry2Image: An Iterative Correction Framework for Images Generated from Chinese Classical Poetry
- Authors: Jing Jiang, Yiran Ling, Binzhu Li, Pengxiang Li, Junming Piao, Yu Zhang,
- Abstract summary: Poetry2Image is an iterative correction framework for images generated from Chinese classical poetry.
The proposed method achieves an average element completeness of 70.63%, representing an improvement of 25.56% over direct image generation.
- Score: 7.536700229966157
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image generation models often struggle with key element loss or semantic confusion in tasks involving Chinese classical poetry.Addressing this issue through fine-tuning models needs considerable training costs. Additionally, manual prompts for re-diffusion adjustments need professional knowledge. To solve this problem, we propose Poetry2Image, an iterative correction framework for images generated from Chinese classical poetry. Utilizing an external poetry dataset, Poetry2Image establishes an automated feedback and correction loop, which enhances the alignment between poetry and image through image generation models and subsequent re-diffusion modifications suggested by large language models (LLM). Using a test set of 200 sentences of Chinese classical poetry, the proposed method--when integrated with five popular image generation models--achieves an average element completeness of 70.63%, representing an improvement of 25.56% over direct image generation. In tests of semantic correctness, our method attains an average semantic consistency of 80.09%. The study not only promotes the dissemination of ancient poetry culture but also offers a reference for similar non-fine-tuning methods to enhance LLM generation.
Related papers
- Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models [18.293592213622183]
We propose a PoemToPixel framework designed to generate images that visually represent the inherent meanings of poems.
Our approach incorporates the concept of prompt tuning in our image generation framework to ensure that the resulting images closely align with the poetic content.
To expand the diversity of the poetry dataset, we introduce MiniPo, a novel multimodal dataset comprising 1001 children's poems and images.
arXiv Detail & Related papers (2025-01-10T10:26:54Z) - Improving Image Captioning by Mimicking Human Reformulation Feedback at Inference-time [35.71703501731082]
We introduce a novel type of feedback -- caption reformulations -- and train models to mimic reformulation feedback based on human annotations.
Our method does not require training the image captioning model itself, thereby demanding substantially less computational effort.
We find that incorporating reformulation models trained on this data into the inference phase of existing image captioning models results in improved captions.
arXiv Detail & Related papers (2025-01-08T14:00:07Z) - Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation [70.95783968368124]
We introduce a novel multi-modal autoregressive model, dubbed $textbfInstaManip$.
We propose an innovative group self-attention mechanism to break down the in-context learning process into two separate stages.
Our method surpasses previous few-shot image manipulation models by a notable margin.
arXiv Detail & Related papers (2024-12-02T01:19:21Z) - Information Theoretic Text-to-Image Alignment [49.396917351264655]
Mutual Information (MI) is used to guide model alignment.
Our method uses self-supervised fine-tuning and relies on a point-wise (MI) estimation between prompts and images.
Our analysis indicates that our method is superior to the state-of-the-art, yet it only requires the pre-trained denoising network of the T2I model itself to estimate MI.
arXiv Detail & Related papers (2024-05-31T12:20:02Z) - Emu: Enhancing Image Generation Models Using Photogenic Needles in a
Haystack [75.00066365801993]
Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text.
These pre-trained models often face challenges when it comes to generating highly aesthetic images.
We propose quality-tuning to guide a pre-trained model to exclusively generate highly visually appealing images.
arXiv Detail & Related papers (2023-09-27T17:30:19Z) - Prose2Poem: The Blessing of Transformers in Translating Prose to Persian
Poetry [2.15242029196761]
We introduce a novel Neural Machine Translation (NMT) approach to translate prose to ancient Persian poetry.
We trained a Transformer model from scratch to obtain initial translations and pretrained different variations of BERT to obtain final translations.
arXiv Detail & Related papers (2021-09-30T09:04:11Z) - Caption Enriched Samples for Improving Hateful Memes Detection [78.5136090997431]
The hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not.
Both unimodal language models and multimodal vision-language models cannot reach the human level of performance.
arXiv Detail & Related papers (2021-09-22T10:57:51Z) - CCPM: A Chinese Classical Poetry Matching Dataset [50.90794811956129]
We propose a novel task to assess a model's semantic understanding of poetry by poem matching.
This task requires the model to select one line of Chinese classical poetry among four candidates according to the modern Chinese translation of a line of poetry.
To construct this dataset, we first obtain a set of parallel data of Chinese classical poetry and modern Chinese translation.
arXiv Detail & Related papers (2021-06-03T16:49:03Z) - Generating Chinese Poetry from Images via Concrete and Abstract
Information [23.690384629376005]
We propose an infilling-based Chinese poetry generation model which can infill the Concrete keywords into each line of poems in an explicit way.
We also use non-parallel data during training and construct separate image datasets and poem datasets to train the different components in our framework.
Both automatic and human evaluation results show that our approach can generate poems which have better consistency with images without losing the quality.
arXiv Detail & Related papers (2020-03-24T11:17:20Z) - Generating Major Types of Chinese Classical Poetry in a Uniformed
Framework [88.57587722069239]
We propose a GPT-2 based framework for generating major types of Chinese classical poems.
Preliminary results show this enhanced model can generate Chinese classical poems of major types with high quality in both form and content.
arXiv Detail & Related papers (2020-03-13T14:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.