FW-GAN: Frequency-Driven Handwriting Synthesis with Wave-Modulated MLP Generator
- URL: http://arxiv.org/abs/2508.21040v1
- Date: Thu, 28 Aug 2025 17:44:52 GMT
- Title: FW-GAN: Frequency-Driven Handwriting Synthesis with Wave-Modulated MLP Generator
- Authors: Huynh Tong Dang Khoa, Dang Hoai Nam, Vo Nguyen Le Duy,
- Abstract summary: We propose FW-GAN, a one-shot handwriting synthesis framework that generates realistic, writer-consistent text from a single example.<n>Our generator integrates a phase-aware Wave-MLP to better capture spatial relationships while preserving subtle stylistic cues.<n> Experiments on Vietnamese and English handwriting datasets demonstrate that FW-GAN generates high-quality, style-consistent handwriting.
- Score: 4.94997283141601
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Labeled handwriting data is often scarce, limiting the effectiveness of recognition systems that require diverse, style-consistent training samples. Handwriting synthesis offers a promising solution by generating artificial data to augment training. However, current methods face two major limitations. First, most are built on conventional convolutional architectures, which struggle to model long-range dependencies and complex stroke patterns. Second, they largely ignore the crucial role of frequency information, which is essential for capturing fine-grained stylistic and structural details in handwriting. To address these challenges, we propose FW-GAN, a one-shot handwriting synthesis framework that generates realistic, writer-consistent text from a single example. Our generator integrates a phase-aware Wave-MLP to better capture spatial relationships while preserving subtle stylistic cues. We further introduce a frequency-guided discriminator that leverages high-frequency components to enhance the authenticity detection of generated samples. Additionally, we introduce a novel Frequency Distribution Loss that aligns the frequency characteristics of synthetic and real handwriting, thereby enhancing visual fidelity. Experiments on Vietnamese and English handwriting datasets demonstrate that FW-GAN generates high-quality, style-consistent handwriting, making it a valuable tool for augmenting data in low-resource handwriting recognition (HTR) pipelines. Official implementation is available at https://github.com/DAIR-Group/FW-GAN
Related papers
- DNA: Dual-branch Network with Adaptation for Open-Set Online Handwriting Generation [28.985690380954765]
We introduce our method for online handwriting generation, where the writer's style and the characters generated during testing are unseen during training.<n>We propose a Dual-branch Network with Adaptation (DNA), which comprises an adaptive style branch and an adaptive content branch.<n>Our DNA model is well-suited for the unseen OHG setting, achieving state-of-the-art performance.
arXiv Detail & Related papers (2025-11-27T03:30:22Z) - ScriptViT: Vision Transformer-Based Personalized Handwriting Generation [0.0]
styled handwriting generation aims to synthesize handwritten text that looks both realistic and aligned with a specific writer's style.<n>We introduce a Vision Transformer-based style encoder that learns global stylistic patterns from multiple reference images.<n>We then integrate these style cues with the target text using a cross-attention mechanism, enabling the system to produce handwritten images that more faithfully reflect the intended style.
arXiv Detail & Related papers (2025-11-23T06:38:23Z) - Beyond Frequency: Scoring-Driven Debiasing for Object Detection via Blueprint-Prompted Image Synthesis [97.37770785712475]
We present a generation-based debiasing framework for object detection.<n>Our method significantly narrows the performance gap for underrepresented object groups.
arXiv Detail & Related papers (2025-10-21T02:19:12Z) - DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation [41.08176249345279]
DiffInk is the first latent diffusion Transformer framework for full-line handwriting generation.<n>We first introduce InkVAE, a novel sequential variational autoencoder enhanced with two complementary latent-space regularization losses.<n>We then introduce InkDiT, a novel latent diffusion Transformer that integrates target text and reference styles to generate coherent pen trajectories.
arXiv Detail & Related papers (2025-09-28T03:58:15Z) - Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.<n>We introduce novel methodologies and datasets to overcome these challenges.<n>We propose MhBART, an encoder-decoder model designed to emulate human writing style.<n>We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z) - Online Writer Retrieval with Chinese Handwritten Phrases: A Synergistic Temporal-Frequency Representation Learning Approach [53.189911918976655]
We propose DOLPHIN, a novel retrieval model designed to enhance handwriting representations through synergistic temporal-frequency analysis.<n>We introduce OLIWER, a large-scale online writer retrieval dataset encompassing over 670,000 Chinese handwritten phrases from 1,731 individuals.<n>Our findings emphasize the significance of point sampling frequency and pressure features in improving handwriting representation quality.
arXiv Detail & Related papers (2024-12-16T11:19:22Z) - DiffusionPen: Towards Controlling the Style of Handwritten Text Generation [7.398476020996681]
DiffusionPen (DiffPen) is a 5-shot style handwritten text generation approach based on Latent Diffusion Models.
Our approach captures both textual and stylistic characteristics of seen and unseen words and styles, generating realistic handwritten samples.
Our method outperforms existing methods qualitatively and quantitatively, and its additional generated data can improve the performance of Handwriting Text Recognition (HTR) systems.
arXiv Detail & Related papers (2024-09-09T20:58:25Z) - Text2Data: Low-Resource Data Generation with Textual Control [100.5970757736845]
Text2Data is a novel approach that utilizes unlabeled data to understand the underlying data distribution.<n>It undergoes finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - Diversified in-domain synthesis with efficient fine-tuning for few-shot
classification [64.86872227580866]
Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class.
We propose DISEF, a novel approach which addresses the generalization challenge in few-shot learning using synthetic data.
We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification.
arXiv Detail & Related papers (2023-12-05T17:18:09Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Offline Detection of Misspelled Handwritten Words by Convolving
Recognition Model Features with Text Labels [0.0]
We introduce the task of comparing a handwriting image to text.
Our model's classification head is trained entirely on synthetic data created using a state-of-the-art generative adversarial network.
Such massive performance gains can lead to significant productivity increases in applications utilizing human-in-the-loop automation.
arXiv Detail & Related papers (2023-09-18T21:13:42Z) - StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale
Text-to-Image Synthesis [54.39789900854696]
StyleGAN-T addresses the specific requirements of large-scale text-to-image synthesis.
It significantly improves over previous GANs and outperforms distilled diffusion models in terms of sample quality and speed.
arXiv Detail & Related papers (2023-01-23T16:05:45Z) - One-shot Compositional Data Generation for Low Resource Handwritten Text
Recognition [10.473427493876422]
Low resource Handwritten Text Recognition is a hard problem due to the scarce annotated data and the very limited linguistic information.
In this paper we address this problem through a data generation technique based on Bayesian Program Learning.
Contrary to traditional generation approaches, which require a huge amount of annotated images, our method is able to generate human-like handwriting using only one sample of each symbol from the desired alphabet.
arXiv Detail & Related papers (2021-05-11T18:53:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.