Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
- URL: http://arxiv.org/abs/2310.17674v1
- Date: Wed, 25 Oct 2023 22:23:54 GMT
- Title: Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
- Authors: Shangbang Long, Siyang Qin, Yasuhisa Fujii, Alessandro Bissacco,
Michalis Raptis
- Abstract summary: HTS can recognize text in an image and identify its 4-level hierarchical structure: characters, words, lines, and paragraphs.
HTS achieves state-of-the-art results on multiple word-level text spotting benchmark datasets as well as geometric layout analysis tasks.
- Score: 52.01356859448068
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose Hierarchical Text Spotter (HTS), a novel method for the joint task
of word-level text spotting and geometric layout analysis. HTS can recognize
text in an image and identify its 4-level hierarchical structure: characters,
words, lines, and paragraphs. The proposed HTS is characterized by two novel
components: (1) a Unified-Detector-Polygon (UDP) that produces Bezier Curve
polygons of text lines and an affinity matrix for paragraph grouping between
detected lines; (2) a Line-to-Character-to-Word (L2C2W) recognizer that splits
lines into characters and further merges them back into words. HTS achieves
state-of-the-art results on multiple word-level text spotting benchmark
datasets as well as geometric layout analysis tasks.
Related papers
- Out of Length Text Recognition with Sub-String Matching [54.63761108308825]
In this paper, we term this task Out of Length (OOL) text recognition.
We propose a novel method called OOL Text Recognition with sub-String Matching (SMTR)
SMTR comprises two cross-attention-based modules: one encodes a sub-string containing multiple characters into next and previous queries, and the other employs the queries to attend to the image features.
arXiv Detail & Related papers (2024-07-17T05:02:17Z) - Contextual Text Block Detection towards Scene Text Understanding [85.40898487745272]
This paper presents contextual text detection, a new setup that detects contextual text blocks (CTBs) for better understanding of texts in scenes.
We formulate the new setup by a dual detection task which first detects integral text units and then groups them into a CTB.
To this end, we design a novel scene text clustering technique that treats integral text units as tokens and groups them (belonging to the same CTB) into an ordered token sequence.
arXiv Detail & Related papers (2022-07-26T14:59:25Z) - Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis.
The first hierarchical scene text dataset is introduced to enable this novel research task.
We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - Unified Line and Paragraph Detection by Graph Convolutional Networks [5.298581058536571]
We formulate the task of detecting lines and paragraphs in a document into a unified two-level clustering problem.
We use a graph convolutional network to predict the relations between text detection boxes and then build both levels of clusters from these predictions.
Experimentally, we demonstrate that the unified approach can be highly efficient while still achieving state-of-the-art quality for detecting paragraphs in public benchmarks and real-world images.
arXiv Detail & Related papers (2022-03-17T22:27:12Z) - All You Need is a Second Look: Towards Arbitrary-Shaped Text Detection [39.17648241471479]
In this paper, we propose a two-stage segmentation-based detector, termed as NASK (Need A Second looK), for arbitrary-shaped text detection.
arXiv Detail & Related papers (2021-06-24T01:44:10Z) - Detection and Rectification of Arbitrary Shaped Scene Texts by using
Text Keypoints and Links [38.71967078941593]
Mask-guided multi-task network detects and rectifies scene texts of arbitrary shapes reliably.
Three types of keypoints are detected which specify the centre line and so the shape of text instances accurately.
Scene texts can be located and rectified by linking up the associated landmark points.
arXiv Detail & Related papers (2021-03-01T06:13:51Z) - SCATTER: Selective Context Attentional Scene Text Recognizer [16.311256552979835]
Scene Text Recognition (STR) is the task of recognizing text against complex image backgrounds.
Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes.
We introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER)
arXiv Detail & Related papers (2020-03-25T09:20:28Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.