Handwriting Transformers
- URL: http://arxiv.org/abs/2104.03964v1
- Date: Thu, 8 Apr 2021 17:59:43 GMT
- Title: Handwriting Transformers
- Authors: Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer,
Fahad Shahbaz Khan, Mubarak Shah
- Abstract summary: We propose a transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement and global and local writing style patterns.
The proposed HWT captures the long and short range relationships within the style examples through a self-attention mechanism.
Our proposed HWT generates realistic styled handwritten text images and significantly outperforms the state-of-the-art demonstrated.
- Score: 98.3964093654716
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel transformer-based styled handwritten text image generation
approach, HWT, that strives to learn both style-content entanglement as well as
global and local writing style patterns. The proposed HWT captures the long and
short range relationships within the style examples through a self-attention
mechanism, thereby encoding both global and local style patterns. Further, the
proposed transformer-based HWT comprises an encoder-decoder attention that
enables style-content entanglement by gathering the style representation of
each query character. To the best of our knowledge, we are the first to
introduce a transformer-based generative network for styled handwritten text
generation. Our proposed HWT generates realistic styled handwritten text images
and significantly outperforms the state-of-the-art demonstrated through
extensive qualitative, quantitative and human-based evaluations. The proposed
HWT can handle arbitrary length of text and any desired writing style in a
few-shot setting. Further, our HWT generalizes well to the challenging scenario
where both words and writing style are unseen during training, generating
realistic styled handwritten text images.
Related papers
- Layout-Agnostic Scene Text Image Synthesis with Diffusion Models [42.37340959594495]
SceneTextGen is a novel diffusion-based model specifically designed to circumvent the need for a predefined layout stage.
The novelty of SceneTextGen lies in its integration of three key components: a character-level encoder for capturing detailed typographic properties and a character-level instance segmentation model and a word-level spotting model to address the issues of unwanted text generation and minor character inaccuracies.
arXiv Detail & Related papers (2024-06-03T07:20:34Z) - StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models [42.45078883553856]
Stylized Text-to-Image Generation (STIG) aims to generate images based on text prompts and style reference images.
We in this paper propose a novel framework dubbed as StyleMaster for this task by leveraging pretrained Stable Diffusion.
Two objective functions are introduced to optimize the model together with denoising loss, which can further enhance semantic and style consistency.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images.
By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models.
Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z) - StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style
Adapter [74.68550659331405]
StyleCrafter is a generic method that enhances pre-trained T2V models with a style control adapter.
To promote content-style disentanglement, we remove style descriptions from the text prompt and extract style information solely from the reference image.
StyleCrafter efficiently generates high-quality stylized videos that align with the content of the texts and resemble the style of the reference images.
arXiv Detail & Related papers (2023-12-01T03:53:21Z) - ControlStyle: Text-Driven Stylized Image Generation Using Diffusion
Priors [105.37795139586075]
We propose a new task for stylizing'' text-to-image models, namely text-driven stylized image generation.
We present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network.
Experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results.
arXiv Detail & Related papers (2023-11-09T15:50:52Z) - ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style
Transfer [57.6482608202409]
Textual style transfer is the task of transforming stylistic properties of text while preserving meaning.
We introduce a novel diffusion-based framework for general-purpose style transfer that can be flexibly adapted to arbitrary target styles.
We validate the method on the Enron Email Corpus, with both human and automatic evaluations, and find that it outperforms strong baselines on formality, sentiment, and even authorship style transfer.
arXiv Detail & Related papers (2023-08-29T17:36:02Z) - Handwritten Text Generation from Visual Archetypes [25.951540903019467]
We devise a Transformer-based model for Few-Shot styled handwritten text generation.
We obtain a robust representation of unseen writers' calligraphy by exploiting specific pre-training on a large synthetic dataset.
arXiv Detail & Related papers (2023-03-27T14:58:20Z) - SLOGAN: Handwriting Style Synthesis for Arbitrary-Length and
Out-of-Vocabulary Text [35.83345711291558]
We propose a novel method that can synthesize parameterized and controllable handwriting Styles for arbitrary-Length and Out-of-vocabulary text.
We embed the text content by providing an easily obtainable printed style image, so that the diversity of the content can be flexibly achieved.
Our method can synthesize words that are not included in the training vocabulary and with various new styles.
arXiv Detail & Related papers (2022-02-23T12:13:27Z) - Fine-grained style control in Transformer-based Text-to-speech Synthesis [78.92428622630861]
We present a novel architecture to realize fine-grained style control on the Transformer-based text-to-speech synthesis (TransformerTTS)
We model the speaking style by extracting a time sequence of local style tokens (LST) from the reference speech.
Experiments show that with fine-grained style control, our system performs better in terms of naturalness, intelligibility, and style transferability.
arXiv Detail & Related papers (2021-10-12T19:50:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.