Separating Content from Style Using Adversarial Learning for Recognizing
Text in the Wild
- URL: http://arxiv.org/abs/2001.04189v3
- Date: Sat, 12 Dec 2020 08:11:22 GMT
- Title: Separating Content from Style Using Adversarial Learning for Recognizing
Text in the Wild
- Authors: Canjie Luo, Qingxiang Lin, Yuliang Liu, Lianwen Jin, Chunhua Shen
- Abstract summary: We propose an adversarial learning framework for the generation and recognition of multiple characters in an image.
Our framework can be integrated into recent recognition methods to achieve new state-of-the-art recognition accuracy.
- Score: 103.51604161298512
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose to improve text recognition from a new perspective by separating
the text content from complex backgrounds. As vanilla GANs are not sufficiently
robust to generate sequence-like characters in natural images, we propose an
adversarial learning framework for the generation and recognition of multiple
characters in an image. The proposed framework consists of an attention-based
recognizer and a generative adversarial architecture. Furthermore, to tackle
the issue of lacking paired training samples, we design an interactive joint
training scheme, which shares attention masks from the recognizer to the
discriminator, and enables the discriminator to extract the features of each
character for further adversarial training. Benefiting from the character-level
adversarial training, our framework requires only unpaired simple data for
style supervision. Each target style sample containing only one randomly chosen
character can be simply synthesized online during the training. This is
significant as the training does not require costly paired samples or
character-level annotations. Thus, only the input images and corresponding text
labels are needed. In addition to the style normalization of the backgrounds,
we refine character patterns to ease the recognition task. A feedback mechanism
is proposed to bridge the gap between the discriminator and the recognizer.
Therefore, the discriminator can guide the generator according to the confusion
of the recognizer, so that the generated patterns are clearer for recognition.
Experiments on various benchmarks, including both regular and irregular text,
demonstrate that our method significantly reduces the difficulty of
recognition. Our framework can be integrated into recent recognition methods to
achieve new state-of-the-art recognition accuracy.
Related papers
- Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing [71.29488677105127]
Existing scene text recognition (STR) methods struggle to recognize challenging texts, especially for artistic and severely distorted characters.
We propose a contrastive learning-based STR framework by leveraging synthetic and real unlabeled data without any human cost.
Our method achieves SOTA performance (94.7% and 70.9% average accuracy on common benchmarks and Union14M-Benchmark.
arXiv Detail & Related papers (2024-11-23T15:24:47Z) - Instruction-Guided Scene Text Recognition [51.853730414264625]
We propose a novel instruction-guided scene text recognition (IGTR) paradigm that formulates STR as an instruction learning problem.
We develop lightweight instruction encoder, cross-modal feature fusion module and multi-task answer head, which guides nuanced text image understanding.
IGTR outperforms existing models by significant margins, while maintaining a small model size and efficient inference speed.
arXiv Detail & Related papers (2024-01-31T14:13:01Z) - CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes [93.71909293023663]
Cross-modality Aligned Prototypes (CAPro) is a unified contrastive learning framework to learn visual representations with correct semantics.
CAPro achieves new state-of-the-art performance and exhibits robustness to open-set recognition.
arXiv Detail & Related papers (2023-10-15T07:20:22Z) - DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition
with Limited Annotations [79.433122872973]
Multi-label image recognition in the low-label regime is a task of great challenge and practical significance.
We leverage the powerful alignment between textual and visual features pretrained with millions of auxiliary image-text pairs.
We introduce an efficient and effective framework called Evidence-guided Dual Context Optimization (DualCoOp++)
arXiv Detail & Related papers (2023-08-03T17:33:20Z) - CoReFace: Sample-Guided Contrastive Regularization for Deep Face
Recognition [3.1677775852317085]
We propose Contrastive Regularization for Face recognition (CoReFace) to apply image-level regularization in feature representation learning.
Specifically, we employ sample-guided contrastive learning to regularize the training with the image-image relationship directly.
To integrate contrastive learning into face recognition, we augment embeddings instead of images to avoid the image quality degradation.
arXiv Detail & Related papers (2023-04-23T14:33:24Z) - Reading and Writing: Discriminative and Generative Modeling for
Self-Supervised Text Recognition [101.60244147302197]
We introduce contrastive learning and masked image modeling to learn discrimination and generation of text images.
Our method outperforms previous self-supervised text recognition methods by 10.2%-20.2% on irregular scene text recognition datasets.
Our proposed text recognizer exceeds previous state-of-the-art text recognition methods by averagely 5.3% on 11 benchmarks, with similar model size.
arXiv Detail & Related papers (2022-07-01T03:50:26Z) - Towards Open-Set Text Recognition via Label-to-Prototype Learning [18.06730376866086]
We propose a label-to-prototype learning framework to handle novel characters without retraining the model.
A lot of experiments show that our method achieves promising performance on a variety of zero-shot, close-set, and open-set text recognition datasets.
arXiv Detail & Related papers (2022-03-10T06:22:51Z) - Pay Attention to What You Read: Non-recurrent Handwritten Text-Line
Recognition [4.301658883577544]
We introduce a non-recurrent approach to recognize handwritten text by the use of transformer models.
We are able to tackle character recognition as well as to learn language-related dependencies of the character sequences to be decoded.
arXiv Detail & Related papers (2020-05-26T21:15:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.