Related papers: One-Shot Diffusion Mimicker for Handwritten Text Generation

One-Shot Diffusion Mimicker for Handwritten Text Generation

URL: http://arxiv.org/abs/2409.04004v2
Date: Wed, 11 Sep 2024 11:52:48 GMT
Title: One-Shot Diffusion Mimicker for Handwritten Text Generation
Authors: Gang Dai, Yifan Zhang, Quhui Ke, Qiangya Guo, Shuangping Huang,
Abstract summary: Existing handwritten text generation methods often require more than ten handwriting samples as style references. One-shot generation significantly simplifies the process but poses a significant challenge due to the difficulty of accurately capturing a writer's style from a single sample. We propose a One-shot Diffusion Mimicker (One-DM) to generate handwritten text that can mimic any calligraphic style with only one reference sample.
Score: 5.845883883415509
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing handwritten text generation methods often require more than ten handwriting samples as style references. However, in practical applications, users tend to prefer a handwriting generation model that operates with just a single reference sample for its convenience and efficiency. This approach, known as "one-shot generation", significantly simplifies the process but poses a significant challenge due to the difficulty of accurately capturing a writer's style from a single sample, especially when extracting fine details from the characters' edges amidst sparse foreground and undesired background noise. To address this problem, we propose a One-shot Diffusion Mimicker (One-DM) to generate handwritten text that can mimic any calligraphic style with only one reference sample. Inspired by the fact that high-frequency information of the individual sample often contains distinct style patterns (e.g., character slant and letter joining), we develop a novel style-enhanced module to improve the style extraction by incorporating high-frequency components from a single sample. We then fuse the style features with the text content as a merged condition for guiding the diffusion model to produce high-quality handwritten text images. Extensive experiments demonstrate that our method can successfully generate handwriting scripts with just one sample reference in multiple languages, even outperforming previous methods using over ten samples. Our source code is available at https://github.com/dailenson/One-DM.

Related papers

Zero-Shot Styled Text Image Generation, but Make It Autoregressive [34.09957000751439]
Styled Handwritten Text Generation (HTG) has recently received attention from the computer vision and document analysis communities. We propose a novel framework for text image generation, dubbed Emuru. Our approach leverages a powerful text image representation model (a variational autoencoder) combined with an autoregressive Transformer.
arXiv Detail & Related papers (2025-03-21T11:56:20Z)
Generative Compositor for Few-Shot Visual Information Extraction [60.663887314625164]
We propose a novel generative model, named Generative generative spatialtor, to address the challenge of few-shot VIE. Generative generative spatialtor is a hybrid pointer-generator network that emulates the operations of a compositor by retrieving words from the source text. The proposed method achieves highly competitive results in the full-sample training, while notably outperforms the baseline in the 1-shot, 5-shot, and 10-shot settings.
arXiv Detail & Related papers (2025-03-21T04:56:24Z)
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt [101.17660804110409]
Text-to-image generation models can create high-quality images from input prompts. They struggle to support the consistent generation of identity-preserving requirements for storytelling. We propose a novel training-free method for consistent text-to-image generation.
arXiv Detail & Related papers (2025-01-23T10:57:22Z)
DiffusionPen: Towards Controlling the Style of Handwritten Text Generation [7.398476020996681]
DiffusionPen (DiffPen) is a 5-shot style handwritten text generation approach based on Latent Diffusion Models. Our approach captures both textual and stylistic characteristics of seen and unseen words and styles, generating realistic handwritten samples. Our method outperforms existing methods qualitatively and quantitatively, and its additional generated data can improve the performance of Handwriting Text Recognition (HTR) systems.
arXiv Detail & Related papers (2024-09-09T20:58:25Z)
Learning to Generate Text in Arbitrary Writing Styles [6.7308816341849695]
It is desirable for language models to produce text in an author-specific style on the basis of a potentially small writing sample. We propose to guide a language model to generate text in a target style using contrastively-trained representations that capture stylometric features.
arXiv Detail & Related papers (2023-12-28T18:58:52Z)
Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences [49.66987347397398]
Few-Shot Stylized Visual Captioning aims to generate captions in any desired style, using only a few examples as guidance during inference. We propose a framework called FS-StyleCap for this task, which utilizes a conditional encoder-decoder language model and a visual projection module.
arXiv Detail & Related papers (2023-07-31T04:26:01Z)
Copy Is All You Need [66.00852205068327]
We formulate text generation as progressively copying text segments from an existing text collection. Our approach achieves better generation quality according to both automatic and human evaluations. Our approach attains additional performance gains by simply scaling up to larger text collections.
arXiv Detail & Related papers (2023-07-13T05:03:26Z)
TextDiffuser: Diffusion Models as Text Painters [118.30923824681642]
We introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds. We contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs. We show that TextDiffuser is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text.
arXiv Detail & Related papers (2023-05-18T10:16:19Z)
Stylized Data-to-Text Generation: A Case Study in the E-Commerce Domain [53.22419717434372]
We propose a new task, namely stylized data-to-text generation, whose aim is to generate coherent text according to a specific style. This task is non-trivial, due to three challenges: the logic of the generated text, unstructured style reference, and biased training samples. We propose a novel stylized data-to-text generation model, named StyleD2T, comprising three components: logic planning-enhanced data embedding, mask-based style embedding, and unbiased stylized text generation.
arXiv Detail & Related papers (2023-05-05T03:02:41Z)
Handwritten Text Generation from Visual Archetypes [25.951540903019467]
We devise a Transformer-based model for Few-Shot styled handwritten text generation. We obtain a robust representation of unseen writers' calligraphy by exploiting specific pre-training on a large synthetic dataset.
arXiv Detail & Related papers (2023-03-27T14:58:20Z)
Diff-Font: Diffusion Model for Robust One-Shot Font Generation [110.45944936952309]
We propose a novel one-shot font generation method based on a diffusion model, named Diff-Font. The proposed model aims to generate the entire font library by giving only one sample as the reference. The well-trained Diff-Font is not only robust to font gap and font variation, but also achieved promising performance on difficult character generation.
arXiv Detail & Related papers (2022-12-12T13:51:50Z)
SLOGAN: Handwriting Style Synthesis for Arbitrary-Length and Out-of-Vocabulary Text [35.83345711291558]
We propose a novel method that can synthesize parameterized and controllable handwriting Styles for arbitrary-Length and Out-of-vocabulary text. We embed the text content by providing an easily obtainable printed style image, so that the diversity of the content can be flexibly achieved. Our method can synthesize words that are not included in the training vocabulary and with various new styles.
arXiv Detail & Related papers (2022-02-23T12:13:27Z)
Letter-level Online Writer Identification [86.13203975836556]
We focus on a novel problem, letter-level online writer-id, which requires only a few trajectories of written letters as identification cues. A main challenge is that a person often writes a letter in different styles from time to time. We refer to this problem as the variance of online writing styles (Var-O-Styles)
arXiv Detail & Related papers (2021-12-06T07:21:53Z)
One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition [10.473427493876422]
Low resource Handwritten Text Recognition is a hard problem due to the scarce annotated data and the very limited linguistic information. In this paper we address this problem through a data generation technique based on Bayesian Program Learning. Contrary to traditional generation approaches, which require a huge amount of annotated images, our method is able to generate human-like handwriting using only one sample of each symbol from the desired alphabet.
arXiv Detail & Related papers (2021-05-11T18:53:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.