DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion
- URL: http://arxiv.org/abs/2303.09604v1
- Date: Thu, 16 Mar 2023 19:12:52 GMT
- Title: DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion
- Authors: Maham Tanveer, Yizhi Wang, Ali Mahdavi-Amiri, Hao Zhang
- Abstract summary: We introduce a novel method to automatically generate an artistic typography by stylizing one or more letter fonts.
Our approach utilizes large language models to bridge texts and visual images for stylization and build an unsupervised generative model.
- Score: 10.75789076591325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a novel method to automatically generate an artistic typography
by stylizing one or more letter fonts to visually convey the semantics of an
input word, while ensuring that the output remains readable. To address an
assortment of challenges with our task at hand including conflicting goals
(artistic stylization vs. legibility), lack of ground truth, and immense search
space, our approach utilizes large language models to bridge texts and visual
images for stylization and build an unsupervised generative model with a
diffusion model backbone. Specifically, we employ the denoising generator in
Latent Diffusion Model (LDM), with the key addition of a CNN-based
discriminator to adapt the input style onto the input text. The discriminator
uses rasterized images of a given letter/word font as real samples and output
of the denoising generator as fake samples. Our model is coined DS-Fusion for
discriminated and stylized diffusion. We showcase the quality and versatility
of our method through numerous examples, qualitative and quantitative
evaluation, as well as ablation studies. User studies comparing to strong
baselines including CLIPDraw and DALL-E 2, as well as artist-crafted
typographies, demonstrate strong performance of DS-Fusion.
Related papers
- FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation [38.730628018627975]
This research aims to tackle the generation of text effects for multilingual fonts.
We introduce a novel shape-adaptive diffusion model capable of interpreting the given shape.
We also present a training-free, shape-adaptive effect transfer method for transferring textures from a generated reference letter to others.
arXiv Detail & Related papers (2024-06-12T16:43:47Z) - UDiffText: A Unified Framework for High-quality Text Synthesis in
Arbitrary Images via Character-aware Diffusion Models [25.219960711604728]
This paper proposes a novel approach for text image generation, utilizing a pre-trained diffusion model.
Our approach involves the design and training of a light-weight character-level text encoder, which replaces the original CLIP encoder.
By employing an inference stage refinement process, we achieve a notably high sequence accuracy when synthesizing text in arbitrarily given images.
arXiv Detail & Related papers (2023-12-08T07:47:46Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - Calliffusion: Chinese Calligraphy Generation and Style Transfer with
Diffusion Modeling [1.856334276134661]
We propose Calliffusion, a system for generating high-quality Chinese calligraphy using diffusion models.
Our model architecture is based on DDPM (Denoising Diffusion Probabilistic Models)
arXiv Detail & Related papers (2023-05-30T15:34:45Z) - Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners [88.07317175639226]
We propose a novel approach, Discriminative Stable Diffusion (DSD), which turns pre-trained text-to-image diffusion models into few-shot discriminative learners.
Our approach mainly uses the cross-attention score of a Stable Diffusion model to capture the mutual influence between visual and textual information.
arXiv Detail & Related papers (2023-05-18T05:41:36Z) - Unified Multi-Modal Latent Diffusion for Joint Subject and Text
Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences.
To be more specific, both input texts and images are encoded into one unified multi-modal latent space.
Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z) - DGFont++: Robust Deformable Generative Networks for Unsupervised Font
Generation [19.473023811252116]
We propose a robust deformable generative network for unsupervised font generation (abbreviated as DGFont++)
To distinguish different styles, we train our model with a multi-task discriminator, which ensures that each style can be discriminated independently.
Experiments demonstrate that our model is able to generate character images of higher quality than state-of-the-art methods.
arXiv Detail & Related papers (2022-12-30T14:35:10Z) - eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert
Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis.
We train an ensemble of text-to-image diffusion models specialized for different stages synthesis.
Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z) - Generating More Pertinent Captions by Leveraging Semantics and Style on
Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources.
Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision.
We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z) - Scalable Font Reconstruction with Dual Latent Manifolds [55.29525824849242]
We propose a deep generative model that performs typography analysis and font reconstruction.
Our approach enables us to massively scale up the number of character types we can effectively model.
We evaluate on the task of font reconstruction over various datasets representing character types of many languages.
arXiv Detail & Related papers (2021-09-10T20:37:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.