TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition
- URL: http://arxiv.org/abs/2305.05322v1
- Date: Tue, 9 May 2023 10:16:43 GMT
- Title: TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition
- Authors: Tianlun Zheng, Zhineng Chen, Jinfeng Bai, Hongtao Xie, Yu-Gang Jiang
- Abstract summary: Text irregularities pose significant challenges to scene text recognizers.
TPS++ is an attention-enhanced TPS transformation that incorporates the attention mechanism to text rectification.
It consistently improves the recognition and achieves state-of-the-art accuracy.
- Score: 78.67283660198403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text irregularities pose significant challenges to scene text recognizers.
Thin-Plate Spline (TPS)-based rectification is widely regarded as an effective
means to deal with them. Currently, the calculation of TPS transformation
parameters purely depends on the quality of regressed text borders. It ignores
the text content and often leads to unsatisfactory rectified results for
severely distorted text. In this work, we introduce TPS++, an
attention-enhanced TPS transformation that incorporates the attention mechanism
to text rectification for the first time. TPS++ formulates the parameter
calculation as a joint process of foreground control point regression and
content-based attention score estimation, which is computed by a dedicated
designed gated-attention block. TPS++ builds a more flexible content-aware
rectifier, generating a natural text correction that is easier to read by the
subsequent recognizer. Moreover, TPS++ shares the feature backbone with the
recognizer in part and implements the rectification at feature-level rather
than image-level, incurring only a small overhead in terms of parameters and
inference time. Experiments on public benchmarks show that TPS++ consistently
improves the recognition and achieves state-of-the-art accuracy. Meanwhile, it
generalizes well on different backbones and recognizers. Code is at
https://github.com/simplify23/TPS_PP.
Related papers
- TexIm FAST: Text-to-Image Representation for Semantic Similarity Evaluation using Transformers [2.7651063843287718]
TexIm FAST is a novel methodology for generating fixed-length representations through a self-supervised Variational Auto-Encoder (VAE) for semantic evaluation applying transformers (TexIm FAST)
The pictorial representations allow oblivious inference while retaining the linguistic intricacies, and are potent in cross-modal applications.
The efficacy of TexIm FAST has been extensively analyzed for the task of Semantic Textual Similarity (STS) upon the MSRPC, CNN/ Daily Mail, and XSum data-sets.
arXiv Detail & Related papers (2024-06-06T18:28:50Z) - Semi-Supervised Coupled Thin-Plate Spline Model for Rotation Correction and Beyond [84.56978780892783]
We propose CoupledTPS, which iteratively couples multiple TPS with limited control points into a more flexible and powerful transformation.
In light of the laborious annotation cost, we develop a semi-supervised learning scheme to improve warping quality by exploiting unlabeled data.
Experiments demonstrate the superiority and universality of CoupledTPS over the existing state-of-the-art solutions for rotation correction.
arXiv Detail & Related papers (2024-01-24T13:03:28Z) - Self-supervised Character-to-Character Distillation for Text Recognition [54.12490492265583]
We propose a novel self-supervised Character-to-Character Distillation method, CCD, which enables versatile augmentations to facilitate text representation learning.
CCD achieves state-of-the-art results, with average performance gains of 1.38% in text recognition, 1.7% in text segmentation, 0.24 dB (PSNR) and 0.0321 (SSIM) in text super-resolution.
arXiv Detail & Related papers (2022-11-01T05:48:18Z) - DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in
Transformer [94.35116535588332]
Transformer-based methods, which predict polygon points or Bezier curve control points to localize texts, are quite popular in scene text detection.
However, the used point label form implies the reading order of humans, which affects the robustness of Transformer model.
We propose DPText-DETR, which directly uses point coordinates as queries and dynamically updates them between decoder layers.
arXiv Detail & Related papers (2022-07-10T15:45:16Z) - TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform
Mask [19.269070203448187]
Arbitrary-shaped scene text detection is a challenging task due to the variety of text changes in font, size, color, and orientation.
We propose a novel light-weight anchor-free text detection framework called TextDCT, which adopts the discrete cosine transform (DCT) to encode the text masks as compact vectors.
TextDCT achieves F-measure of 85.1 at 17.2 frames per second (FPS) and F-measure of 84.9 at 15.1 FPS for CTW1500 and Total-Text datasets, respectively.
arXiv Detail & Related papers (2022-06-27T15:42:25Z) - A Text Attention Network for Spatial Deformation Robust Scene Text Image
Super-resolution [13.934846626570286]
Scene text image super-resolution aims to increase the resolution and readability of the text in low-resolution images.
It remains difficult to reconstruct high-resolution images for spatially deformed texts, especially rotated and curve-shaped ones.
We propose a CNN based Text ATTention network (TATT) to address this problem.
arXiv Detail & Related papers (2022-03-17T15:28:29Z) - TPSNet: Thin-Plate-Spline Representation for Arbitrary Shape Scene Text
Detection [4.8345307057837354]
Thin-Plate-Spline (TPS) transformation has achieved great success in scene text recognition.
TPS representation is compact, complete and integral.
Two novel losses including the boundary set loss and the shape alignment loss are proposed.
arXiv Detail & Related papers (2021-10-25T11:47:17Z) - CentripetalText: An Efficient Text Instance Representation for Scene
Text Detection [19.69057252363207]
We propose an efficient text instance representation named CentripetalText (CT)
CT decomposes text instances into the combination of text kernels and centripetal shifts.
For the task of scene text detection, our approach achieves superior or competitive performance compared to other existing methods.
arXiv Detail & Related papers (2021-07-13T09:34:18Z) - PAN++: Towards Efficient and Accurate End-to-End Spotting of
Arbitrarily-Shaped Text [85.7020597476857]
We propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes.
PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels.
As a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications.
arXiv Detail & Related papers (2021-05-02T07:04:30Z) - PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering
Network [54.03560668182197]
We propose a novel fully convolutional Point Gathering Network (PGNet) for reading arbitrarily-shaped text in real-time.
With a PG-CTC decoder, we gather high-level character classification vectors from two-dimensional space and decode them into text symbols without NMS and RoI operations.
Experiments prove that the proposed method achieves competitive accuracy, meanwhile significantly improving the running speed.
arXiv Detail & Related papers (2021-04-12T13:27:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.