Related papers: Semi-supervised Chinese Poem-to-Painting Generation via Cycle-consistent Adversarial Networks

Semi-supervised Chinese Poem-to-Painting Generation via Cycle-consistent Adversarial Networks

URL: http://arxiv.org/abs/2410.19307v1
Date: Fri, 25 Oct 2024 04:57:44 GMT
Title: Semi-supervised Chinese Poem-to-Painting Generation via Cycle-consistent Adversarial Networks
Authors: Zhengyang Lu, Tianhao Guo, Feng Wang,
Abstract summary: We propose a semi-supervised approach using cycle-consistent adversarial networks to leverage the limited paired data. We introduce novel evaluation metrics to assess the quality, diversity, and consistency of the generated poems and paintings. The proposed model outperforms previous methods, showing promise in capturing the symbolic essence of artistic expression.
Score: 2.250406890348191
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Classical Chinese poetry and painting represent the epitome of artistic expression, but the abstract and symbolic nature of their relationship poses a significant challenge for computational translation. Most existing methods rely on large-scale paired datasets, which are scarce in this domain. In this work, we propose a semi-supervised approach using cycle-consistent adversarial networks to leverage the limited paired data and large unpaired corpus of poems and paintings. The key insight is to learn bidirectional mappings that enforce semantic alignment between the visual and textual modalities. We introduce novel evaluation metrics to assess the quality, diversity, and consistency of the generated poems and paintings. Extensive experiments are conducted on a new Chinese Painting Description Dataset (CPDD). The proposed model outperforms previous methods, showing promise in capturing the symbolic essence of artistic expression. Codes are available online \url{https://github.com/Mnster00/poemtopainting}.

Related papers

PoemTale Diffusion: Minimising Information Loss in Poem to Image Generation with Multi-Stage Prompt Refinement [18.293592213622183]
PoemTale Diffusion aims to minimise the information that is lost during poetic text-to-image conversion.<n>To support this, we adapt existing state-of-the-art diffusion models by modifying their self-attention mechanisms.<n>To encourage research in the field of poetry, we introduce the P4I dataset, consisting of 1111 poems.
arXiv Detail & Related papers (2025-07-18T07:33:08Z)
Zero-Shot Chinese Character Recognition with Hierarchical Multi-Granularity Image-Text Aligning [52.92837273570818]
Chinese characters exhibit unique structures and compositional rules, allowing for the use of fine-grained semantic information in representation.<n>We propose a Hierarchical Multi-Granularity Image-Text Aligning (Hi-GITA) framework based on a contrastive paradigm.<n>Our proposed Hi-GITA outperforms existing zero-shot CCR methods.
arXiv Detail & Related papers (2025-05-30T17:39:14Z)
Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models [18.293592213622183]
We propose a PoemToPixel framework designed to generate images that visually represent the inherent meanings of poems. Our approach incorporates the concept of prompt tuning in our image generation framework to ensure that the resulting images closely align with the poetic content. To expand the diversity of the poetry dataset, we introduce MiniPo, a novel multimodal dataset comprising 1001 children's poems and images.
arXiv Detail & Related papers (2025-01-10T10:26:54Z)
Compositional Entailment Learning for Hyperbolic Vision-Language Models [54.41927525264365]
We show how to fully leverage the innate hierarchical nature of hyperbolic embeddings by looking beyond individual image-text pairs. We propose Compositional Entailment Learning for hyperbolic vision-language models. Empirical evaluation on a hyperbolic vision-language model trained with millions of image-text pairs shows that the proposed compositional learning approach outperforms conventional Euclidean CLIP learning.
arXiv Detail & Related papers (2024-10-09T14:12:50Z)
KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph [24.586916324061168]
We present KALE Knowledge-Augmented vision-Language model for artwork Elaborations. KALE incorporates the metadata in two ways: firstly as direct textual input, and secondly through a multimodal heterogeneous knowledge graph. Experimental results demonstrate that KALE achieves strong performance over existing state-of-the-art work across several artwork datasets.
arXiv Detail & Related papers (2024-09-17T06:39:18Z)
DLP-GAN: learning to draw modern Chinese landscape photos with generative adversarial network [20.74857981451259]
Chinese landscape painting has a unique and artistic style, and its drawing technique is highly abstract in both the use of color and the realistic representation of objects. Previous methods focus on transferring from modern photos to ancient ink paintings, but little attention has been paid to translating landscape paintings into modern photos.
arXiv Detail & Related papers (2024-03-06T04:46:03Z)
Leveraging Open-Vocabulary Diffusion to Camouflaged Instance Segmentation [59.78520153338878]
Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions. We propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations.
arXiv Detail & Related papers (2023-12-29T07:59:07Z)
ALADIN-NST: Self-supervised disentangled representation learning of artistic style through Neural Style Transfer [60.6863849241972]
We learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image. We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics.
arXiv Detail & Related papers (2023-04-12T10:33:18Z)
Paint4Poem: A Dataset for Artistic Visualization of Classical Chinese Poems [20.72849584295798]
We construct a new dataset called Paint4Poem. Paint4Poem consists of 301 high-quality poem-painting pairs collected manually from an influential modern Chinese artist. We analyze Paint4Poem regarding poem diversity, painting style, and the semantic relevance between poems and paintings.
arXiv Detail & Related papers (2021-09-23T22:57:16Z)
Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning [50.08729005865331]
This paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework. To capture the correlations between the image and text at multiple levels of abstraction, we design a variational inference network. To guide the paragraph generation, the learned hierarchical topics and visual features are integrated into the language model.
arXiv Detail & Related papers (2021-05-10T06:55:39Z)
Improving Image Captioning with Better Use of Captions [65.39641077768488]
We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation. Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning. During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences.
arXiv Detail & Related papers (2020-06-21T14:10:47Z)
Improving Disentangled Text Representation Learning with Information-Theoretic Guidance [99.68851329919858]
discrete nature of natural language makes disentangling of textual representations more challenging. Inspired by information theory, we propose a novel method that effectively manifests disentangled representations of text. Experiments on both conditional text generation and text-style transfer demonstrate the high quality of our disentangled representation.
arXiv Detail & Related papers (2020-06-01T03:36:01Z)
Generating Chinese Poetry from Images via Concrete and Abstract Information [23.690384629376005]
We propose an infilling-based Chinese poetry generation model which can infill the Concrete keywords into each line of poems in an explicit way. We also use non-parallel data during training and construct separate image datasets and poem datasets to train the different components in our framework. Both automatic and human evaluation results show that our approach can generate poems which have better consistency with images without losing the quality.
arXiv Detail & Related papers (2020-03-24T11:17:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.