CORE-Text: Improving Scene Text Detection with Contrastive Relational
Reasoning
- URL: http://arxiv.org/abs/2112.07513v1
- Date: Tue, 14 Dec 2021 16:22:25 GMT
- Title: CORE-Text: Improving Scene Text Detection with Contrastive Relational
Reasoning
- Authors: Jingyang Lin and Yingwei Pan and Rongfeng Lai and Xuehang Yang and
Hongyang Chao and Ting Yao
- Abstract summary: Localizing text instances in natural scenes is regarded as a fundamental challenge in computer vision.
In this work, we quantitatively analyze the sub-text problem and present a simple yet effective design, COntrastive RElation (CORE) module.
We integrate the CORE module into a two-stage text detector of Mask R-CNN and devise our text detector CORE-Text.
- Score: 65.57338873921168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Localizing text instances in natural scenes is regarded as a fundamental
challenge in computer vision. Nevertheless, owing to the extremely varied
aspect ratios and scales of text instances in real scenes, most conventional
text detectors suffer from the sub-text problem that only localizes the
fragments of text instance (i.e., sub-texts). In this work, we quantitatively
analyze the sub-text problem and present a simple yet effective design,
COntrastive RElation (CORE) module, to mitigate that issue. CORE first
leverages a vanilla relation block to model the relations among all text
proposals (sub-texts of multiple text instances) and further enhances
relational reasoning via instance-level sub-text discrimination in a
contrastive manner. Such way naturally learns instance-aware representations of
text proposals and thus facilitates scene text detection. We integrate the CORE
module into a two-stage text detector of Mask R-CNN and devise our text
detector CORE-Text. Extensive experiments on four benchmarks demonstrate the
superiority of CORE-Text. Code is available:
\url{https://github.com/jylins/CORE-Text}.
Related papers
- Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Contextual Text Block Detection towards Scene Text Understanding [85.40898487745272]
This paper presents contextual text detection, a new setup that detects contextual text blocks (CTBs) for better understanding of texts in scenes.
We formulate the new setup by a dual detection task which first detects integral text units and then groups them into a CTB.
To this end, we design a novel scene text clustering technique that treats integral text units as tokens and groups them (belonging to the same CTB) into an ordered token sequence.
arXiv Detail & Related papers (2022-07-26T14:59:25Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - Attention-based Feature Decomposition-Reconstruction Network for Scene
Text Detection [20.85468268945721]
We propose attention-based feature decomposition-reconstruction network for scene text detection.
We use contextual information and low-level feature to enhance the performance of segmentation-based text detector.
Experiments have been conducted on two public benchmark datasets and results show that our proposed method achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-11-29T06:15:25Z) - Video Text Tracking With a Spatio-Temporal Complementary Model [46.99051486905713]
Text tracking is to track multiple texts in a video,and construct a trajectory for each text.
Existing methodle this task by utilizing the tracking-by-detection frame-work.
We argue that the tracking accuracy of this paradigmis severely limited in more complex scenarios.
arXiv Detail & Related papers (2021-11-09T08:23:06Z) - MOST: A Multi-Oriented Scene Text Detector with Localization Refinement [67.35280008722255]
We propose a new algorithm for scene text detection, which puts forward a set of strategies to significantly improve the quality of text localization.
Specifically, a Text Feature Alignment Module (TFAM) is proposed to dynamically adjust the receptive fields of features.
A Position-Aware Non-Maximum Suppression (PA-NMS) module is devised to exclude unreliable ones.
arXiv Detail & Related papers (2021-04-02T14:34:41Z) - Rethinking Text Segmentation: A Novel Dataset and A Text-Specific
Refinement Approach [34.63444886780274]
Text segmentation is a prerequisite in real-world text-related tasks.
We introduce Text Refinement Network (TexRNet), a novel text segmentation approach.
TexRNet consistently improves text segmentation performance by nearly 2% compared to other state-of-the-art segmentation methods.
arXiv Detail & Related papers (2020-11-27T22:50:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.