Contextual Text Block Detection towards Scene Text Understanding
- URL: http://arxiv.org/abs/2207.12955v1
- Date: Tue, 26 Jul 2022 14:59:25 GMT
- Title: Contextual Text Block Detection towards Scene Text Understanding
- Authors: Chuhui Xue, Jiaxing Huang, Shijian Lu, Changhu Wang, Song Bai
- Abstract summary: This paper presents contextual text detection, a new setup that detects contextual text blocks (CTBs) for better understanding of texts in scenes.
We formulate the new setup by a dual detection task which first detects integral text units and then groups them into a CTB.
To this end, we design a novel scene text clustering technique that treats integral text units as tokens and groups them (belonging to the same CTB) into an ordered token sequence.
- Score: 85.40898487745272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing scene text detectors focus on detecting characters or words
that only capture partial text messages due to missing contextual information.
For a better understanding of text in scenes, it is more desired to detect
contextual text blocks (CTBs) which consist of one or multiple integral text
units (e.g., characters, words, or phrases) in natural reading order and
transmit certain complete text messages. This paper presents contextual text
detection, a new setup that detects CTBs for better understanding of texts in
scenes. We formulate the new setup by a dual detection task which first detects
integral text units and then groups them into a CTB. To this end, we design a
novel scene text clustering technique that treats integral text units as tokens
and groups them (belonging to the same CTB) into an ordered token sequence. In
addition, we create two datasets SCUT-CTW-Context and ReCTS-Context to
facilitate future research, where each CTB is well annotated by an ordered
sequence of integral text units. Further, we introduce three metrics that
measure contextual text detection in local accuracy, continuity, and global
accuracy. Extensive experiments show that our method accurately detects CTBs
which effectively facilitates downstream tasks such as text classification and
translation. The project is available at
https://sg-vilab.github.io/publication/xue2022contextual/.
Related papers
- Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis [52.34110239735265]
We present Text Grouping Adapter (TGA), a module that can enable the utilization of various pre-trained text detectors to learn layout analysis.
Our comprehensive experiments demonstrate that, even with frozen pre-trained models, incorporating our TGA into various pre-trained text detectors and text spotters can achieve superior layout analysis performance.
arXiv Detail & Related papers (2024-05-13T05:48:35Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis.
The first hierarchical scene text dataset is introduced to enable this novel research task.
We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - CORE-Text: Improving Scene Text Detection with Contrastive Relational
Reasoning [65.57338873921168]
Localizing text instances in natural scenes is regarded as a fundamental challenge in computer vision.
In this work, we quantitatively analyze the sub-text problem and present a simple yet effective design, COntrastive RElation (CORE) module.
We integrate the CORE module into a two-stage text detector of Mask R-CNN and devise our text detector CORE-Text.
arXiv Detail & Related papers (2021-12-14T16:22:25Z) - Video Text Tracking With a Spatio-Temporal Complementary Model [46.99051486905713]
Text tracking is to track multiple texts in a video,and construct a trajectory for each text.
Existing methodle this task by utilizing the tracking-by-detection frame-work.
We argue that the tracking accuracy of this paradigmis severely limited in more complex scenarios.
arXiv Detail & Related papers (2021-11-09T08:23:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.