Towards End-to-End Unified Scene Text Detection and Layout Analysis
- URL: http://arxiv.org/abs/2203.15143v1
- Date: Mon, 28 Mar 2022 23:35:45 GMT
- Title: Towards End-to-End Unified Scene Text Detection and Layout Analysis
- Authors: Shangbang Long, Siyang Qin, Dmitry Panteleev, Alessandro Bissacco,
Yasuhisa Fujii, Michalis Raptis
- Abstract summary: We introduce the task of unified scene text detection and layout analysis.
The first hierarchical scene text dataset is introduced to enable this novel research task.
We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
- Score: 60.68100769639923
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene text detection and document layout analysis have long been treated as
two separate tasks in different image domains. In this paper, we bring them
together and introduce the task of unified scene text detection and layout
analysis. The first hierarchical scene text dataset is introduced to enable
this novel research task. We also propose a novel method that is able to
simultaneously detect scene text and form text clusters in a unified way.
Comprehensive experiments show that our unified model achieves better
performance than multiple well-designed baseline methods. Additionally, this
model achieves state-of-the-art results on multiple scene text detection
datasets without the need of complex post-processing. Dataset and code:
https://github.com/google-research-datasets/hiertext.
Related papers
- Towards Unified Multi-granularity Text Detection with Interactive Attention [56.79437272168507]
"Detect Any Text" is an advanced paradigm that unifies scene text detection, layout analysis, and document page detection into a cohesive, end-to-end model.
A pivotal innovation in DAT is the across-granularity interactive attention module, which significantly enhances the representation learning of text instances.
Tests demonstrate that DAT achieves state-of-the-art performances across a variety of text-related benchmarks.
arXiv Detail & Related papers (2024-05-30T07:25:23Z) - Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis [52.34110239735265]
We present Text Grouping Adapter (TGA), a module that can enable the utilization of various pre-trained text detectors to learn layout analysis.
Our comprehensive experiments demonstrate that, even with frozen pre-trained models, incorporating our TGA into various pre-trained text detectors and text spotters can achieve superior layout analysis performance.
arXiv Detail & Related papers (2024-05-13T05:48:35Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - On Exploring and Improving Robustness of Scene Text Detection Models [20.15225372544634]
We evaluate scene text detection models ICDAR2015-C (IC15-C) and CTW1500-C (CTW-C)
We perform a robustness analysis of six key components: pre-training data, backbone, feature fusion module, multi-scale predictions, representation of text instances and loss function.
We present a simple yet effective data-based method to destroy the smoothness of text regions by merging background and foreground.
arXiv Detail & Related papers (2021-10-12T02:36:48Z) - Scene Text Retrieval via Joint Text Detection and Similarity Learning [68.24531728554892]
Scene text retrieval aims to localize and search all text instances from an image gallery, which are the same or similar to a given query text.
We address this problem by directly learning a cross-modal similarity between a query text and each text instance from natural images.
In this way, scene text retrieval can be simply performed by ranking the detected text instances with the learned similarity.
arXiv Detail & Related papers (2021-04-04T07:18:38Z) - Rethinking Text Segmentation: A Novel Dataset and A Text-Specific
Refinement Approach [34.63444886780274]
Text segmentation is a prerequisite in real-world text-related tasks.
We introduce Text Refinement Network (TexRNet), a novel text segmentation approach.
TexRNet consistently improves text segmentation performance by nearly 2% compared to other state-of-the-art segmentation methods.
arXiv Detail & Related papers (2020-11-27T22:50:09Z) - Scene text removal via cascaded text stroke detection and erasing [19.306751704904705]
Recent learning-based approaches show promising performance improvement for scene text removal task.
We propose a novel "end-to-end" framework based on accurate text stroke detection.
arXiv Detail & Related papers (2020-11-19T11:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.