Inverse-like Antagonistic Scene Text Spotting via Reading-Order
Estimation and Dynamic Sampling
- URL: http://arxiv.org/abs/2401.03637v1
- Date: Mon, 8 Jan 2024 02:47:47 GMT
- Title: Inverse-like Antagonistic Scene Text Spotting via Reading-Order
Estimation and Dynamic Sampling
- Authors: Shi-Xue Zhang, Chun Yang, Xiaobin Zhu, Hongyang Zhou, Hongfa Wang,
Xu-Cheng Yin
- Abstract summary: We propose a unified end-to-end trainable inverse-like antagonistic text spotting framework dubbed IATS.
Specifically, we propose an innovative reading-order estimation module (REM) that extracts reading-order information from the initial text boundary.
We show that our method achieves superior performance both on irregular and inverse-like text spotting.
- Score: 26.420235903805782
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene text spotting is a challenging task, especially for inverse-like scene
text, which has complex layouts, e.g., mirrored, symmetrical, or retro-flexed.
In this paper, we propose a unified end-to-end trainable inverse-like
antagonistic text spotting framework dubbed IATS, which can effectively spot
inverse-like scene texts without sacrificing general ones. Specifically, we
propose an innovative reading-order estimation module (REM) that extracts
reading-order information from the initial text boundary generated by an
initial boundary module (IBM). To optimize and train REM, we propose a joint
reading-order estimation loss consisting of a classification loss, an
orthogonality loss, and a distribution loss. With the help of IBM, we can
divide the initial text boundary into two symmetric control points and
iteratively refine the new text boundary using a lightweight boundary
refinement module (BRM) for adapting to various shapes and scales. To alleviate
the incompatibility between text detection and recognition, we propose a
dynamic sampling module (DSM) with a thin-plate spline that can dynamically
sample appropriate features for recognition in the detected text region.
Without extra supervision, the DSM can proactively learn to sample appropriate
features for text recognition through the gradient returned by the recognition
module. Extensive experiments on both challenging scene text and inverse-like
scene text datasets demonstrate that our method achieves superior performance
both on irregular and inverse-like text spotting.
Related papers
- Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance [6.93632116687419]
Local semantic knowledge not only includes text content but also spatial information in the right reading order.
We propose the Local Semantics Guided scene text Spotter (LSGSpotter), which auto-regressively decodes the position and content of characters guided by the local semantics.
LSGSpotter achieves the arbitrary reading order spotting task without the limitation of sophisticated detection.
arXiv Detail & Related papers (2024-12-13T14:20:43Z) - Seeing Text in the Dark: Algorithm and Benchmark [28.865779563872977]
In this work, we propose an efficient and effective single-stage approach for localizing text in dark.
We introduce a constrained learning module as an auxiliary mechanism during the training stage of the text detector.
We present a comprehensive low-light dataset for arbitrary-shaped text, encompassing diverse scenes and languages.
arXiv Detail & Related papers (2024-04-13T11:07:10Z) - SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting [126.01629300244001]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2.
We enhance the relationship between two tasks using novel Recognition Conversion and Recognition Alignment modules.
SwinTextSpotter v2 achieved state-of-the-art performance on various multilingual (English, Chinese, and Vietnamese) benchmarks.
arXiv Detail & Related papers (2024-01-15T12:33:00Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Self-supervised Character-to-Character Distillation for Text Recognition [54.12490492265583]
We propose a novel self-supervised Character-to-Character Distillation method, CCD, which enables versatile augmentations to facilitate text representation learning.
CCD achieves state-of-the-art results, with average performance gains of 1.38% in text recognition, 1.7% in text segmentation, 0.24 dB (PSNR) and 0.0321 (SSIM) in text super-resolution.
arXiv Detail & Related papers (2022-11-01T05:48:18Z) - TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform
Mask [19.269070203448187]
Arbitrary-shaped scene text detection is a challenging task due to the variety of text changes in font, size, color, and orientation.
We propose a novel light-weight anchor-free text detection framework called TextDCT, which adopts the discrete cosine transform (DCT) to encode the text masks as compact vectors.
TextDCT achieves F-measure of 85.1 at 17.2 frames per second (FPS) and F-measure of 84.9 at 15.1 FPS for CTW1500 and Total-Text datasets, respectively.
arXiv Detail & Related papers (2022-06-27T15:42:25Z) - Few Could Be Better Than All: Feature Sampling and Grouping for Scene
Text Detection [47.820683360286786]
We present a transformer-based architecture for scene text detection.
We first select a few representative features at all scales that are highly relevant to foreground text.
As each feature group corresponds to a text instance, its bounding box can be easily obtained without any post-processing operation.
arXiv Detail & Related papers (2022-03-29T04:02:31Z) - MOST: A Multi-Oriented Scene Text Detector with Localization Refinement [67.35280008722255]
We propose a new algorithm for scene text detection, which puts forward a set of strategies to significantly improve the quality of text localization.
Specifically, a Text Feature Alignment Module (TFAM) is proposed to dynamically adjust the receptive fields of features.
A Position-Aware Non-Maximum Suppression (PA-NMS) module is devised to exclude unreliable ones.
arXiv Detail & Related papers (2021-04-02T14:34:41Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z) - A New Perspective for Flexible Feature Gathering in Scene Text
Recognition Via Character Anchor Pooling [32.82620509088932]
We propose a pair of coupling modules, termed as Character Anchoring Module (CAM) and Anchor Pooling Module (APM)
CAM localizes the text in a shape-insensitive way by design by anchoring characters individually. APM then interpolates and gathers features flexibly along the character anchors which enables sequence learning.
arXiv Detail & Related papers (2020-02-10T03:01:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.