SynthTIGER: Synthetic Text Image GEneratoR Towards Better Text
Recognition Models
- URL: http://arxiv.org/abs/2107.09313v1
- Date: Tue, 20 Jul 2021 08:03:45 GMT
- Title: SynthTIGER: Synthetic Text Image GEneratoR Towards Better Text
Recognition Models
- Authors: Moonbin Yim, Yoonsik Kim, Han-Cheol Cho and Sungrae Park
- Abstract summary: We introduce a new synthetic text image generator, SynthTIGER, by analyzing techniques used for text image synthesis and integrating effective ones under a single algorithm.
In our experiments, SynthTIGER achieves better STR performance than the combination of synthetic datasets.
- Score: 9.934446907923725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For successful scene text recognition (STR) models, synthetic text image
generators have alleviated the lack of annotated text images from the real
world. Specifically, they generate multiple text images with diverse
backgrounds, font styles, and text shapes and enable STR models to learn visual
patterns that might not be accessible from manually annotated data. In this
paper, we introduce a new synthetic text image generator, SynthTIGER, by
analyzing techniques used for text image synthesis and integrating effective
ones under a single algorithm. Moreover, we propose two techniques that
alleviate the long-tail problem in length and character distributions of
training data. In our experiments, SynthTIGER achieves better STR performance
than the combination of synthetic datasets, MJSynth (MJ) and SynthText (ST).
Our ablation study demonstrates the benefits of using sub-components of
SynthTIGER and the guideline on generating synthetic text images for STR
models. Our implementation is publicly available at
https://github.com/clovaai/synthtiger.
Related papers
- CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning [23.63386159778117]
We design a emphcontrollable image-text synthesis pipeline, Ctrl Synth, for data-efficient and robust learning.
Ctrl Synth allows users to control data synthesis in a fine-grained manner by defining customized control policies.
We show that Ctrl Synth substantially improves zero-shot classification, image-text retrieval, and compositional reasoning performance of CLIP models.
arXiv Detail & Related papers (2024-10-15T18:06:41Z) - Learning Vision from Models Rivals Learning Vision from Data [54.43596959598465]
We introduce SynCLR, a novel approach for learning visual representations exclusively from synthetic images and synthetic captions.
We synthesize a large dataset of image captions using LLMs, then use an off-the-shelf text-to-image model to generate multiple images corresponding to each synthetic caption.
We perform visual representation learning on these synthetic images via contrastive learning, treating images sharing the same caption as positive pairs.
arXiv Detail & Related papers (2023-12-28T18:59:55Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Style Generation: Image Synthesis based on Coarsely Matched Texts [10.939482612568433]
We introduce a novel task called text-based style generation and propose a two-stage generative adversarial network.
The first stage generates the overall image style with a sentence feature, and the second stage refines the generated style with a synthetic feature.
The practical potential of our work is demonstrated by various applications such as text-image alignment and story visualization.
arXiv Detail & Related papers (2023-09-08T21:51:11Z) - StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale
Text-to-Image Synthesis [54.39789900854696]
StyleGAN-T addresses the specific requirements of large-scale text-to-image synthesis.
It significantly improves over previous GANs and outperforms distilled diffusion models in terms of sample quality and speed.
arXiv Detail & Related papers (2023-01-23T16:05:45Z) - Recurrent Affine Transformation for Text-to-image Synthesis [5.256132101498471]
Existing methods usually adaptively fuse suitable text information into the synthesis process with isolated fusion blocks.
We propose a Recurrent Affine Transformation (RAT) for Generative Adrial Networks that connects all the fusion blocks with a recurrent neural network to model their long-term dependency.
Being aware of matching image regions, text descriptions supervise the generator to synthesize more relevant image contents.
arXiv Detail & Related papers (2022-04-22T03:49:47Z) - StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis [52.341186561026724]
Lacking compositionality could have severe implications for robustness and fairness.
We introduce a new framework, StyleT2I, to improve the compositionality of text-to-image synthesis.
Results show that StyleT2I outperforms previous approaches in terms of consistency between the input text and synthesized images.
arXiv Detail & Related papers (2022-03-29T17:59:50Z) - Multi-Attributed and Structured Text-to-Face Synthesis [1.3381749415517017]
Generative Adrial Networks (GANs) have revolutionized image synthesis through many applications like face generation, photograph editing, and image super-resolution.
This paper empirically proves that increasing the number of facial attributes in each textual description helps GANs generate more diverse and real-looking faces.
arXiv Detail & Related papers (2021-08-25T07:52:21Z) - Cycle-Consistent Inverse GAN for Text-to-Image Synthesis [101.97397967958722]
We propose a novel unified framework of Cycle-consistent Inverse GAN for both text-to-image generation and text-guided image manipulation tasks.
We learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image.
In the text-guided optimization module, we generate images with the desired semantic attributes by optimizing the inverted latent codes.
arXiv Detail & Related papers (2021-08-03T08:38:16Z) - DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis [80.54273334640285]
We propose a novel one-stage text-to-image backbone that directly synthesizes high-resolution images without entanglements between different generators.
We also propose a novel Target-Aware Discriminator composed of Matching-Aware Gradient Penalty and One-Way Output.
Compared with current state-of-the-art methods, our proposed DF-GAN is simpler but more efficient to synthesize realistic and text-matching images.
arXiv Detail & Related papers (2020-08-13T12:51:17Z) - Efficient Neural Architecture for Text-to-Image Synthesis [6.166295570030645]
We show that an effective neural architecture can achieve state-of-the-art performance using a single stage training with a single generator and a single discriminator.
Our work points a new direction for text-to-image research, which has not experimented with novel neural architectures recently.
arXiv Detail & Related papers (2020-04-23T19:33:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.