VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models
- URL: http://arxiv.org/abs/2410.01738v3
- Date: Tue, 22 Jul 2025 13:43:53 GMT
- Title: VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models
- Authors: Kailai Feng, Yabo Zhang, Haodong Yu, Zhilong Ji, Jinfeng Bai, Hongzhi Zhang, Wangmeng Zuo,
- Abstract summary: Artistic typography is a technique to visualize the meaning of input character in an imaginable and readable manner.<n>We introduce a dual-branch, training-free method called VitaGlyph, enabling flexible artistic typography with controllable geometry changes.
- Score: 53.59400446543756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artistic typography is a technique to visualize the meaning of input character in an imaginable and readable manner. With powerful text-to-image diffusion models, existing methods directly design the overall geometry and texture of input character, making it challenging to ensure both creativity and legibility. In this paper, we introduce a dual-branch, training-free method called VitaGlyph, enabling flexible artistic typography with controllable geometry changes while maintaining the readability. The key insight of VitaGlyph is to treat input character as a scene composed of a Subject and its Surrounding, which are rendered with varying degrees of geometric transformation. To enhance the visual appeal and creativity of the generated artistic typography, the subject flexibly expresses the essential concept of the input character, while the surrounding enriches relevant background without altering the shape, thus maintaining overall readability. Specifically, we implement VitaGlyph through a three-phase framework: (i) Knowledge Acquisition leverages large language models to design text descriptions for the subject and surrounding. (ii) Regional Interpretation detects the part that most closely matches the subject description and refines the structure via Semantic Typography. (iii) Attentional Compositional Generation separately renders the textures of the Subject and Surrounding regions and blends them in an attention-based manner. Experimental results demonstrate that VitaGlyph not only achieves better artistry and readability but also manages to depict multiple customized concepts, facilitating more creative and pleasing artistic typography generation. Our code will be made publicly available.
Related papers
- WordCraft: Interactive Artistic Typography with Attention Awareness and Noise Blending [12.655120187133779]
Artistic typography aims to stylize input characters with visual effects that are both creative and legible.<n>Traditional approaches rely heavily on manual design, while recent generative models, particularly diffusion-based methods, have enabled automated character stylization.<n>We introduce WordCraft, an interactive artistic typography system that integrates diffusion models to address these limitations.
arXiv Detail & Related papers (2025-07-13T10:49:09Z) - Calligrapher: Freestyle Text Image Customization [72.71919410487881]
Calligrapher is a novel diffusion-based framework that integrates advanced text customization with artistic typography.<n>By automating high-quality, visually consistent typography, Calligrapher surpasses traditional models.
arXiv Detail & Related papers (2025-06-30T17:59:06Z) - Beyond Flat Text: Dual Self-inherited Guidance for Visual Text Generation [17.552733309504486]
In real-world images, slanted or curved texts, especially those on cans, banners, or badges, appear as frequently as flat texts due to artistic design or layout constraints.
We introduce a new training-free framework, STGen, which accurately generates visual texts in challenging scenarios.
arXiv Detail & Related papers (2025-01-10T11:44:59Z) - Intelligent Artistic Typography: A Comprehensive Review of Artistic Text Design and Generation [15.367944842667146]
Artistic text generation aims to amplify the aesthetic qualities of text while maintaining readability.
Artistic text stylization concentrates on the text effect overlaid upon the text, such as shadows, outlines, colors, glows, and textures.
Stylistization focuses on the deformation of the characters to strengthen their visual representation by mimicking the semantic understanding within the text.
arXiv Detail & Related papers (2024-07-20T06:45:09Z) - FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation [38.730628018627975]
This research aims to tackle the generation of text effects for multilingual fonts.
We introduce a novel shape-adaptive diffusion model capable of interpreting the given shape.
We also present a training-free, shape-adaptive effect transfer method for transferring textures from a generated reference letter to others.
arXiv Detail & Related papers (2024-06-12T16:43:47Z) - Dynamic Typography: Bringing Text to Life via Video Diffusion Prior [73.72522617586593]
We present an automated text animation scheme, termed "Dynamic Typography"
It deforms letters to convey semantic meaning and infuses them with vibrant movements based on user prompts.
Our technique harnesses vector graphics representations and an end-to-end optimization-based framework.
arXiv Detail & Related papers (2024-04-17T17:59:55Z) - Text-Guided Synthesis of Eulerian Cinemagraphs [81.20353774053768]
We introduce Text2Cinemagraph, a fully automated method for creating cinemagraphs from text descriptions.
We focus on cinemagraphs of fluid elements, such as flowing rivers, and drifting clouds, which exhibit continuous motion and repetitive textures.
arXiv Detail & Related papers (2023-07-06T17:59:31Z) - GlyphDiffusion: Text Generation as Image Generation [100.98428068214736]
We propose GlyphDiffusion, a novel diffusion approach for text generation via text-guided image generation.
Our key idea is to render the target text as a glyph image containing visual language content.
Our model also makes significant improvements compared to the recent diffusion model.
arXiv Detail & Related papers (2023-04-25T02:14:44Z) - ALADIN-NST: Self-supervised disentangled representation learning of
artistic style through Neural Style Transfer [60.6863849241972]
We learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image.
We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics.
arXiv Detail & Related papers (2023-04-12T10:33:18Z) - ISS: Image as Stetting Stone for Text-Guided 3D Shape Generation [91.37036638939622]
This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities.
Our key contribution is a two-stage feature-space-alignment approach that maps CLIP features to shapes.
We formulate a text-guided shape stylization module to dress up the output shapes with novel textures.
arXiv Detail & Related papers (2022-09-09T06:54:21Z) - Toward Understanding WordArt: Corner-Guided Transformer for Scene Text
Recognition [63.6608759501803]
We propose to recognize artistic text at three levels.
corner points are applied to guide the extraction of local features inside characters, considering the robustness of corner structures to appearance and shape.
Secondly, we design a character contrastive loss to model the character-level feature, improving the feature representation for character classification.
Thirdly, we utilize Transformer to learn the global feature on image-level and model the global relationship of the corner points.
arXiv Detail & Related papers (2022-07-31T14:11:05Z) - GenText: Unsupervised Artistic Text Generation via Decoupled Font and
Texture Manipulation [30.654807125764965]
We propose a novel approach, namely GenText, to achieve general artistic text style transfer.
Specifically, our work incorporates three different stages, stylization, destylization, and font transfer.
Considering the difficult data acquisition of paired artistic text images, our model is designed under the unsupervised setting.
arXiv Detail & Related papers (2022-07-20T04:42:47Z) - Integrating Visuospatial, Linguistic and Commonsense Structure into
Story Visualization [81.26077816854449]
We first explore the use of constituency parse trees for encoding structured input.
Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story.
Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
arXiv Detail & Related papers (2021-10-21T00:16:02Z) - Font Completion and Manipulation by Cycling Between Multi-Modality
Representations [113.26243126754704]
We innovate to explore the generation of font glyphs as 2D graphic objects with the graph as an intermediate representation.
We formulate a cross-modality cycled image-to-image structure with a graph between an image encoder and an image.
Our model generates improved results than both image-to-image baseline and previous state-of-the-art methods for glyph completion.
arXiv Detail & Related papers (2021-08-30T02:43:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.