Scene Text Recognition With Finer Grid Rectification
- URL: http://arxiv.org/abs/2001.09389v1
- Date: Sun, 26 Jan 2020 02:40:11 GMT
- Title: Scene Text Recognition With Finer Grid Rectification
- Authors: Gang Wang
- Abstract summary: This paper proposed an end-to-end trainable model consists of a finer rectification module and a bidirectional attentional recognition network(Firbarn)
The results of extensive evaluation on the standard benchmarks show Firbarn outperforms previous works, especially on irregular datasets.
- Score: 6.598317412802175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene Text Recognition is a challenging problem because of irregular styles
and various distortions. This paper proposed an end-to-end trainable model
consists of a finer rectification module and a bidirectional attentional
recognition network(Firbarn). The rectification module adopts finer grid to
rectify the distorted input image and the bidirectional decoder contains only
one decoding layer instead of two separated one. Firbarn can be trained in a
weak supervised way, only requiring the scene text images and the corresponding
word labels. With the flexible rectification and the novel bidirectional
decoder, the results of extensive evaluation on the standard benchmarks show
Firbarn outperforms previous works, especially on irregular datasets.
Related papers
- Leveraging Structure Knowledge and Deep Models for the Detection of Abnormal Handwritten Text [19.05500901000957]
We propose a two-stage detection algorithm that combines structure knowledge and deep models for handwritten text.
A shape regression network trained by a novel semi-supervised contrast training strategy is introduced and the positional relationship between the characters is fully employed.
Experiments on two handwritten text datasets show that the proposed method can greatly improve the detection performance.
arXiv Detail & Related papers (2024-10-15T14:57:10Z) - Is it an i or an l: Test-time Adaptation of Text Line Recognition Models [9.149602257966917]
We introduce the problem of adapting text line recognition models during test time.
We propose an iterative self-training approach that uses feedback from the language model to update the optical model.
Experimental results show that the proposed adaptation method offers an absolute improvement of up to 8% in character error rate.
arXiv Detail & Related papers (2023-08-29T05:44:00Z) - RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment [112.45442468794658]
We propose a two-stage coarse-to-fine semantic re-alignment method, named RealignDiff.
In the coarse semantic re-alignment phase, a novel caption reward is proposed to evaluate the semantic discrepancy between the generated image caption and the given text prompt.
The fine semantic re-alignment stage employs a local dense caption generation module and a re-weighting attention modulation module to refine the previously generated images from a local semantic view.
arXiv Detail & Related papers (2023-05-31T06:59:21Z) - Exploring Stroke-Level Modifications for Scene Text Editing [86.33216648792964]
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text.
Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously.
We propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL)
arXiv Detail & Related papers (2022-12-05T02:10:59Z) - SceneComposer: Any-Level Semantic Image Synthesis [80.55876413285587]
We propose a new framework for conditional image synthesis from semantic layouts of any precision levels.
The framework naturally reduces to text-to-image (T2I) at the lowest level with no shape information, and it becomes segmentation-to-image (S2I) at the highest level.
We introduce several novel techniques to address the challenges coming with this new setup.
arXiv Detail & Related papers (2022-11-21T18:59:05Z) - Portmanteauing Features for Scene Text Recognition [15.961450585164144]
State-of-the-art methods rely on a rectification network, which is connected to the text recognition network.
A portmanteau feature, inspired by the portmanteau word, is a feature containing information from both the original text image and the rectified image.
The proposed method is examined on 6 benchmarks and compared with 13 state-of-the-art methods.
arXiv Detail & Related papers (2022-11-09T17:14:14Z) - Cross Modification Attention Based Deliberation Model for Image
Captioning [11.897899189552318]
We propose a universal two-pass decoding framework for image captioning.
A single-pass decoding based model first generates a draft caption according to an input image.
A Deliberation Model then performs the polishing process to refine the draft caption to a better image description.
arXiv Detail & Related papers (2021-09-17T08:38:08Z) - I2C2W: Image-to-Character-to-Word Transformers for Accurate Scene Text
Recognition [68.95544645458882]
This paper presents I2C2W, a novel scene text recognizer that is accurate and tolerant to various noises in scenes.
I2C2W consists of an image-to-character module (I2C) and a character-to-word module (C2W) which are complementary and can be trained end-to-end.
arXiv Detail & Related papers (2021-05-18T09:20:58Z) - Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs)
We compare their accuracy and performance on widely used public datasets of scene and handwritten text.
Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z) - SOLD2: Self-supervised Occlusion-aware Line Description and Detection [95.8719432775724]
We introduce the first joint detection and description of line segments in a single deep network.
Our method does not require any annotated line labels and can therefore generalize to any dataset.
We evaluate our approach against previous line detection and description methods on several multi-view datasets.
arXiv Detail & Related papers (2021-04-07T19:27:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.