Comprehensive Studies for Arbitrary-shape Scene Text Detection
- URL: http://arxiv.org/abs/2107.11800v1
- Date: Sun, 25 Jul 2021 13:18:55 GMT
- Title: Comprehensive Studies for Arbitrary-shape Scene Text Detection
- Authors: Pengwen Dai, Xiaochun Cao
- Abstract summary: We propose a unified framework for the bottom-up based scene text detection methods.
Under the unified framework, we ensure the consistent settings for non-core modules.
With the comprehensive investigations and elaborate analyses, it reveals the advantages and disadvantages of previous models.
- Score: 78.50639779134944
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Numerous scene text detection methods have been proposed in recent years.
Most of them declare they have achieved state-of-the-art performances. However,
the performance comparison is unfair, due to lots of inconsistent settings
(e.g., training data, backbone network, multi-scale feature fusion, evaluation
protocols, etc.). These various settings would dissemble the pros and cons of
the proposed core techniques. In this paper, we carefully examine and analyze
the inconsistent settings, and propose a unified framework for the bottom-up
based scene text detection methods. Under the unified framework, we ensure the
consistent settings for non-core modules, and mainly investigate the
representations of describing arbitrary-shape scene texts, e.g., regressing
points on text contours, clustering pixels with predicted auxiliary
information, grouping connected components with learned linkages, etc. With the
comprehensive investigations and elaborate analyses, it not only cleans up the
obstacle of understanding the performance differences between existing methods
but also reveals the advantages and disadvantages of previous models under fair
comparisons.
Related papers
- Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis [52.34110239735265]
We present Text Grouping Adapter (TGA), a module that can enable the utilization of various pre-trained text detectors to learn layout analysis.
Our comprehensive experiments demonstrate that, even with frozen pre-trained models, incorporating our TGA into various pre-trained text detectors and text spotters can achieve superior layout analysis performance.
arXiv Detail & Related papers (2024-05-13T05:48:35Z) - End-to-End Evaluation for Low-Latency Simultaneous Speech Translation [55.525125193856084]
We propose the first framework to perform and evaluate the various aspects of low-latency speech translation under realistic conditions.
This includes the segmentation of the audio as well as the run-time of the different components.
We also compare different approaches to low-latency speech translation using this framework.
arXiv Detail & Related papers (2023-08-07T09:06:20Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Few Could Be Better Than All: Feature Sampling and Grouping for Scene
Text Detection [47.820683360286786]
We present a transformer-based architecture for scene text detection.
We first select a few representative features at all scales that are highly relevant to foreground text.
As each feature group corresponds to a text instance, its bounding box can be easily obtained without any post-processing operation.
arXiv Detail & Related papers (2022-03-29T04:02:31Z) - Attention-based Feature Decomposition-Reconstruction Network for Scene
Text Detection [20.85468268945721]
We propose attention-based feature decomposition-reconstruction network for scene text detection.
We use contextual information and low-level feature to enhance the performance of segmentation-based text detector.
Experiments have been conducted on two public benchmark datasets and results show that our proposed method achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-11-29T06:15:25Z) - On Exploring and Improving Robustness of Scene Text Detection Models [20.15225372544634]
We evaluate scene text detection models ICDAR2015-C (IC15-C) and CTW1500-C (CTW-C)
We perform a robustness analysis of six key components: pre-training data, backbone, feature fusion module, multi-scale predictions, representation of text instances and loss function.
We present a simple yet effective data-based method to destroy the smoothness of text regions by merging background and foreground.
arXiv Detail & Related papers (2021-10-12T02:36:48Z) - Which and Where to Focus: A Simple yet Accurate Framework for
Arbitrary-Shaped Nearby Text Detection in Scene Images [8.180563824325086]
We propose a simple yet effective method for accurate arbitrary-shaped nearby scene text detection.
A One-to-Many Training Scheme (OMTS) is designed to eliminate confusion and enable the proposals to learn more appropriate groundtruths.
We also propose a Proposal Feature Attention Module (PFAM) to exploit more effective features for each proposal.
arXiv Detail & Related papers (2021-09-08T06:25:37Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.