AnyArtisticGlyph: Multilingual Controllable Artistic Glyph Generation
- URL: http://arxiv.org/abs/2504.04743v1
- Date: Mon, 07 Apr 2025 05:37:39 GMT
- Title: AnyArtisticGlyph: Multilingual Controllable Artistic Glyph Generation
- Authors: Xiongbo Lu, Yaxiong Chen, Shengwu Xiong,
- Abstract summary: Artistic Glyph Image Generation (AGIG) transfers the style of a reference image to a source while preserving its content.<n>We introduce AnyArtisticGlyph, a diffusion-based, multilingual controllable artistic glyph generation model.
- Score: 19.2293398270178
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artistic Glyph Image Generation (AGIG) differs from current creativity-focused generation models by offering finely controllable deterministic generation. It transfers the style of a reference image to a source while preserving its content. Although advanced and promising, current methods may reveal flaws when scrutinizing synthesized image details, often producing blurred or incorrect textures, posing a significant challenge. Hence, we introduce AnyArtisticGlyph, a diffusion-based, multilingual controllable artistic glyph generation model. It includes a font fusion and embedding module, which generates latent features for detailed structure creation, and a vision-text fusion and embedding module that uses the CLIP model to encode references and blends them with transformation caption embeddings for seamless global image generation. Moreover, we incorporate a coarse-grained feature-level loss to enhance generation accuracy. Experiments show that it produces natural, detailed artistic glyph images with state-of-the-art performance. Our project will be open-sourced on https://github.com/jiean001/AnyArtisticGlyph to advance text generation technology.
Related papers
- VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models [53.59400446543756]
We introduce a dual-branch and training-free method, namely VitaGlyph, to enable flexible artistic typography.
VitaGlyph treats input character as a scene composed of Subject and Surrounding, followed by rendering them under varying degrees of geometry transformation.
Experimental results demonstrate that VitaGlyph not only achieves better artistry and readability, but also manages to depict multiple customize concepts.
arXiv Detail & Related papers (2024-10-02T16:48:47Z) - Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models [20.19571676239579]
We introduce a novel diffusion-based framework to enhance the alignment of generated images with their corresponding descriptions.
Our framework is built upon a comprehensive analysis of inconsistency phenomena, categorizing them based on their manifestation in the image.
We then integrate a state-of-the-art controllable image generation model with a visual text generation module to generate an image that is consistent with the original prompt.
arXiv Detail & Related papers (2024-06-24T06:12:16Z) - LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image
Generation [121.45667242282721]
We propose a coarse-to-fine paradigm to achieve layout planning and image generation.
Our proposed method outperforms the state-of-the-art models in terms of photorealistic layout and image generation.
arXiv Detail & Related papers (2023-08-09T17:45:04Z) - ProSpect: Prompt Spectrum for Attribute-Aware Personalization of
Diffusion Models [77.03361270726944]
Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models.
We propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information.
We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout.
arXiv Detail & Related papers (2023-05-25T16:32:01Z) - GlyphDiffusion: Text Generation as Image Generation [100.98428068214736]
We propose GlyphDiffusion, a novel diffusion approach for text generation via text-guided image generation.
Our key idea is to render the target text as a glyph image containing visual language content.
Our model also makes significant improvements compared to the recent diffusion model.
arXiv Detail & Related papers (2023-04-25T02:14:44Z) - CCLAP: Controllable Chinese Landscape Painting Generation via Latent
Diffusion Model [54.74470985388726]
controllable Chinese landscape painting generation method named CCLAP.
Our method achieves state-of-the-art performance, especially in artfully-composed and artistic conception.
arXiv Detail & Related papers (2023-04-09T04:16:28Z) - GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures
in Text-to-Image Generation [18.396131717250793]
We introduce GlyphDraw, a general learning framework aiming to endow image generation models with the capacity to generate images coherently embedded with text for any specific language.
Our method not only produces accurate language characters as in prompts, but also seamlessly blends the generated text into the background.
arXiv Detail & Related papers (2023-03-31T08:06:33Z) - Unified Multi-Modal Latent Diffusion for Joint Subject and Text
Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences.
To be more specific, both input texts and images are encoded into one unified multi-modal latent space.
Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z) - Plug-and-Play Diffusion Features for Text-Driven Image-to-Image
Translation [10.39028769374367]
We present a new framework that takes text-to-image synthesis to the realm of image-to-image translation.
Our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text.
arXiv Detail & Related papers (2022-11-22T20:39:18Z) - GenText: Unsupervised Artistic Text Generation via Decoupled Font and
Texture Manipulation [30.654807125764965]
We propose a novel approach, namely GenText, to achieve general artistic text style transfer.
Specifically, our work incorporates three different stages, stylization, destylization, and font transfer.
Considering the difficult data acquisition of paired artistic text images, our model is designed under the unsupervised setting.
arXiv Detail & Related papers (2022-07-20T04:42:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.