Related papers: Space Narrative: Generating Images and 3D Scenes of Chinese Garden from Text using Deep Learning

Space Narrative: Generating Images and 3D Scenes of Chinese Garden from Text using Deep Learning

URL: http://arxiv.org/abs/2311.00339v1
Date: Wed, 1 Nov 2023 07:16:01 GMT
Title: Space Narrative: Generating Images and 3D Scenes of Chinese Garden from Text using Deep Learning
Authors: Jiaxi Shi1 and Hao Hua1
Abstract summary: We propose a method to generate garden paintings based on text descriptions using deep learning method. Our image-text pair dataset consists of more than one thousand Ming Dynasty Garden paintings and their inscriptions and post-scripts. A latent text-to-image diffusion model learns the mapping from de-scriptive texts to garden paintings of the Ming Dynasty, and then the text description of Jichang Garden guides the model to generate new garden paintings.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The consistent mapping from poems to paintings is essential for the research and restoration of traditional Chinese gardens. But the lack of firsthand ma-terial is a great challenge to the reconstruction work. In this paper, we pro-pose a method to generate garden paintings based on text descriptions using deep learning method. Our image-text pair dataset consists of more than one thousand Ming Dynasty Garden paintings and their inscriptions and post-scripts. A latent text-to-image diffusion model learns the mapping from de-scriptive texts to garden paintings of the Ming Dynasty, and then the text description of Jichang Garden guides the model to generate new garden paintings. The cosine similarity between the guide text and the generated image is the evaluation criterion for the generated images. Our dataset is used to fine-tune the pre-trained diffusion model using Low-Rank Adapta-tion of Large Language Models (LoRA). We also transformed the generated images into a panorama and created a free-roam scene in Unity 3D. Our post-trained model is capable of generating garden images in the style of Ming Dynasty landscape paintings based on textual descriptions. The gener-ated images are compatible with three-dimensional presentation in Unity 3D.

Related papers

SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild [55.619708995575785]
The text in natural scene images needs to meet the following four key criteria. The generated text can facilitate to the training of natural scene OCR (Optical Character Recognition) tasks. The generated images have superior utility in OCR tasks like text detection and text recognition.
arXiv Detail & Related papers (2025-01-06T12:09:08Z)
CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction [23.683636588751753]
State-of-the-art inpainting methods are mainly designed for nature images and cannot correctly recover text within scene text images. We identify the visual-text inpainting task to achieve high-quality scene text image restoration and text completion.
arXiv Detail & Related papers (2024-07-23T06:12:19Z)
DLP-GAN: learning to draw modern Chinese landscape photos with generative adversarial network [20.74857981451259]
Chinese landscape painting has a unique and artistic style, and its drawing technique is highly abstract in both the use of color and the realistic representation of objects. Previous methods focus on transferring from modern photos to ancient ink paintings, but little attention has been paid to translating landscape paintings into modern photos.
arXiv Detail & Related papers (2024-03-06T04:46:03Z)
Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model [31.819060415422353]
Diff-Text is a training-free scene text generation framework for any language. Our method outperforms the existing method in both the accuracy of text recognition and the naturalness of foreground-background blending.
arXiv Detail & Related papers (2023-12-19T15:18:40Z)
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs [74.98581417902201]
We propose a novel framework to generate compositional 3D scenes from scene graphs. By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model. We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer.
arXiv Detail & Related papers (2023-11-30T18:59:58Z)
Text-Driven Image Editing via Learnable Regions [74.45313434129005]
We introduce a method for region-based image editing driven by textual prompts, without the need for user-provided masks or sketches. We show that this simple approach enables flexible editing that is compatible with current image generation models. Experiments demonstrate the competitive performance of our method in manipulating images with high fidelity and realism that correspond to the provided language descriptions.
arXiv Detail & Related papers (2023-11-28T02:27:31Z)
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation [121.45667242282721]
We propose a coarse-to-fine paradigm to achieve layout planning and image generation. Our proposed method outperforms the state-of-the-art models in terms of photorealistic layout and image generation.
arXiv Detail & Related papers (2023-08-09T17:45:04Z)
Text-Guided Synthesis of Eulerian Cinemagraphs [81.20353774053768]
We introduce Text2Cinemagraph, a fully automated method for creating cinemagraphs from text descriptions. We focus on cinemagraphs of fluid elements, such as flowing rivers, and drifting clouds, which exhibit continuous motion and repetitive textures.
arXiv Detail & Related papers (2023-07-06T17:59:31Z)
Learning to Imagine: Visually-Augmented Natural Language Generation [73.65760028876943]
We propose a method to make pre-trained language models (PLMs) Learn to Imagine for Visuallyaugmented natural language gEneration. We use a diffusion model to synthesize high-quality images conditioned on the input texts. We conduct synthesis for each sentence rather than generate only one image for an entire paragraph.
arXiv Detail & Related papers (2023-05-26T13:59:45Z)
Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields [29.907615852310204]
We present Text2NeRF, which is able to generate a wide range of 3D scenes purely from a text prompt. Our method requires no additional training data but only a natural language description of the scene as the input.
arXiv Detail & Related papers (2023-05-19T10:58:04Z)
GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures in Text-to-Image Generation [18.396131717250793]
We introduce GlyphDraw, a general learning framework aiming to endow image generation models with the capacity to generate images coherently embedded with text for any specific language. Our method not only produces accurate language characters as in prompts, but also seamlessly blends the generated text into the background.
arXiv Detail & Related papers (2023-03-31T08:06:33Z)
Context-Aware Image Inpainting with Learned Semantic Priors [100.99543516733341]
We introduce pretext tasks that are semantically meaningful to estimating the missing contents. We propose a context-aware image inpainting model, which adaptively integrates global semantics and local features.
arXiv Detail & Related papers (2021-06-14T08:09:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.