LPGen: Enhancing High-Fidelity Landscape Painting Generation through Diffusion Model
- URL: http://arxiv.org/abs/2407.17229v2
- Date: Thu, 25 Jul 2024 09:29:21 GMT
- Title: LPGen: Enhancing High-Fidelity Landscape Painting Generation through Diffusion Model
- Authors: Wanggong Yang, Xiaona Wang, Yingrui Qiu, Yifei Zhao,
- Abstract summary: This paper presents LPGen, a high-fidelity, controllable model for landscape painting generation.
We introduce a novel multi-modal framework that integrates image prompts into the diffusion model.
We implement a decoupled cross-attention strategy to ensure compatibility between image and text prompts, facilitating multi-modal image generation.
- Score: 1.7966001353008776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating landscape paintings expands the possibilities of artistic creativity and imagination. Traditional landscape painting methods involve using ink or colored ink on rice paper, which requires substantial time and effort. These methods are susceptible to errors and inconsistencies and lack precise control over lines and colors. This paper presents LPGen, a high-fidelity, controllable model for landscape painting generation, introducing a novel multi-modal framework that integrates image prompts into the diffusion model. We extract its edges and contours by computing canny edges from the target landscape image. These, along with natural language text prompts and drawing style references, are fed into the latent diffusion model as conditions. We implement a decoupled cross-attention strategy to ensure compatibility between image and text prompts, facilitating multi-modal image generation. A decoder generates the final image. Quantitative and qualitative analyses demonstrate that our method outperforms existing approaches in landscape painting generation and exceeds the current state-of-the-art. The LPGen network effectively controls the composition and color of landscape paintings, generates more accurate images, and supports further research in deep learning-based landscape painting generation.
Related papers
- Training-Free Sketch-Guided Diffusion with Latent Optimization [22.94468603089249]
We propose an innovative training-free pipeline that extends existing text-to-image generation models to incorporate a sketch as an additional condition.
To generate new images with a layout and structure closely resembling the input sketch, we find that these core features of a sketch can be tracked with the cross-attention maps of diffusion models.
We introduce latent optimization, a method that refines the noisy latent at each intermediate step of the generation process.
arXiv Detail & Related papers (2024-08-31T00:44:03Z) - CreativeSynth: Creative Blending and Synthesis of Visual Arts based on
Multimodal Diffusion [74.44273919041912]
Large-scale text-to-image generative models have made impressive strides, showcasing their ability to synthesize a vast array of high-quality images.
However, adapting these models for artistic image editing presents two significant challenges.
We build the innovative unified framework Creative Synth, which is based on a diffusion model with the ability to coordinate multimodal inputs.
arXiv Detail & Related papers (2024-01-25T10:42:09Z) - ArtBank: Artistic Style Transfer with Pre-trained Diffusion Model and
Implicit Style Prompt Bank [9.99530386586636]
Artistic style transfer aims to repaint the content image with the learned artistic style.
Existing artistic style transfer methods can be divided into two categories: small model-based approaches and pre-trained large-scale model-based approaches.
We propose ArtBank, a novel artistic style transfer framework, to generate highly realistic stylized images.
arXiv Detail & Related papers (2023-12-11T05:53:40Z) - Collaborative Neural Painting [27.880814775833578]
We introduce a novel task, Collaborative Neural Painting (CNP), to facilitate collaborative art painting generation between humans and machines.
CNP should produce a sequence of strokes supporting the completion of a coherent painting.
We propose a painting representation based on a sequence of parametrized strokes, which makes it easy both editing and composition operations.
arXiv Detail & Related papers (2023-12-04T10:45:12Z) - Image Inpainting via Tractable Steering of Diffusion Models [54.13818673257381]
This paper proposes to exploit the ability of Tractable Probabilistic Models (TPMs) to exactly and efficiently compute the constrained posterior.
Specifically, this paper adopts a class of expressive TPMs termed Probabilistic Circuits (PCs)
We show that our approach can consistently improve the overall quality and semantic coherence of inpainted images with only 10% additional computational overhead.
arXiv Detail & Related papers (2023-11-28T21:14:02Z) - Advancing Urban Renewal: An Automated Approach to Generating Historical
Arcade Facades with Stable Diffusion Models [1.645684081891833]
This study introduces a new methodology for automatically generating images of historical arcade facades.
By classifying and tagging a variety of arcade styles, we have constructed several realistic arcade facade image datasets.
Our approach has demonstrated high levels of precision, authenticity, and diversity in the generated images.
arXiv Detail & Related papers (2023-11-20T08:03:12Z) - Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion [50.59261592343479]
We present Kandinsky1, a novel exploration of latent diffusion architecture.
The proposed model is trained separately to map text embeddings to image embeddings of CLIP.
We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting.
arXiv Detail & Related papers (2023-10-05T12:29:41Z) - CCLAP: Controllable Chinese Landscape Painting Generation via Latent
Diffusion Model [54.74470985388726]
controllable Chinese landscape painting generation method named CCLAP.
Our method achieves state-of-the-art performance, especially in artfully-composed and artistic conception.
arXiv Detail & Related papers (2023-04-09T04:16:28Z) - QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity [94.5479418998225]
We propose a new style transfer framework called QuantArt for high visual-fidelity stylization.
Our framework achieves significantly higher visual fidelity compared with the existing style transfer methods.
arXiv Detail & Related papers (2022-12-20T17:09:53Z) - Compositional Transformers for Scene Generation [13.633811200719627]
We introduce the GANformer2 model, an iterative object-oriented transformer, explored for the task of generative modeling.
We show it achieves state-of-the-art performance in terms of visual quality, diversity and consistency.
Further experiments demonstrate the model's disentanglement and provide a deeper insight into its generative process.
arXiv Detail & Related papers (2021-11-17T08:11:42Z) - Modeling Artistic Workflows for Image Generation and Editing [83.43047077223947]
We propose a generative model that follows a given artistic workflow.
It enables both multi-stage image generation as well as multi-stage image editing of an existing piece of art.
arXiv Detail & Related papers (2020-07-14T17:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.