LineCounter: Learning Handwritten Text Line Segmentation by Counting
- URL: http://arxiv.org/abs/2105.11307v1
- Date: Mon, 24 May 2021 14:42:54 GMT
- Title: LineCounter: Learning Handwritten Text Line Segmentation by Counting
- Authors: Deng Li, Yue Wu, and Yicong Zhou
- Abstract summary: Handwritten Text Line (HTLS) is a low-level but important task for document processing.
We propose a novel Line Counting formulation for HTLS -- that involves counting the number of text lines from the top at every pixel location.
This formulation helps learn an end-to-end HTLS solution that directly predicts per-pixel line number for a given document image.
- Score: 37.06878615666929
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Handwritten Text Line Segmentation (HTLS) is a low-level but important task
for many higher-level document processing tasks like handwritten text
recognition. It is often formulated in terms of semantic segmentation or object
detection in deep learning. However, both formulations have serious
shortcomings. The former requires heavy post-processing of splitting/merging
adjacent segments, while the latter may fail on dense or curved texts. In this
paper, we propose a novel Line Counting formulation for HTLS -- that involves
counting the number of text lines from the top at every pixel location. This
formulation helps learn an end-to-end HTLS solution that directly predicts
per-pixel line number for a given document image. Furthermore, we propose a
deep neural network (DNN) model LineCounter to perform HTLS through the Line
Counting formulation. Our extensive experiments on the three public datasets
(ICDAR2013-HSC, HIT-MW, and VML-AHTE) demonstrate that LineCounter outperforms
state-of-the-art HTLS approaches. Source code is available at
https://github.com/Leedeng/Line-Counter.
Related papers
- LESS: Label-Efficient and Single-Stage Referring 3D Segmentation [55.06002976797879]
Referring 3D is a visual-language task that segments all points of the specified object from a 3D point cloud described by a sentence of query.
We propose a novel Referring 3D pipeline, Label-Efficient and Single-Stage, dubbed LESS, which is only under the supervision of efficient binary mask.
We achieve state-of-the-art performance on ScanRefer dataset by surpassing the previous methods about 3.7% mIoU using only binary labels.
arXiv Detail & Related papers (2024-10-17T07:47:41Z) - The Power of Summary-Source Alignments [62.76959473193149]
Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection.
alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data.
This paper proposes extending the summary-source alignment framework by applying it at the more fine-grained proposition span level.
arXiv Detail & Related papers (2024-06-02T19:35:19Z) - DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text
Spotting [129.73247700864385]
DeepSolo is a simple detection transformer baseline that lets a single Decoder with Explicit Points Solo for text detection and recognition simultaneously.
We introduce a text-matching criterion to deliver more accurate supervisory signals, thus enabling more efficient training.
arXiv Detail & Related papers (2022-11-19T19:06:22Z) - BN-HTRd: A Benchmark Dataset for Document Level Offline Bangla
Handwritten Text Recognition (HTR) and Line Segmentation [0.0]
We introduce a new dataset for offline Handwritten Text Recognition (HTR) from images of Bangla scripts comprising words, lines, and document-level annotations.
The BN-HTRd dataset is based on the BBC Bangla News corpus, meant to act as ground truth texts.
Our dataset includes 788 images of handwritten pages produced by approximately 150 different writers.
arXiv Detail & Related papers (2022-05-29T22:56:26Z) - SOLD2: Self-supervised Occlusion-aware Line Description and Detection [95.8719432775724]
We introduce the first joint detection and description of line segments in a single deep network.
Our method does not require any annotated line labels and can therefore generalize to any dataset.
We evaluate our approach against previous line detection and description methods on several multi-view datasets.
arXiv Detail & Related papers (2021-04-07T19:27:17Z) - Text Line Segmentation for Challenging Handwritten Document Images Using
Fully Convolutional Network [0.0]
This paper presents a method for text line segmentation of challenging historical manuscript images.
We rely on line masks that connect the components on the same text line.
FCN has been successfully used for text line segmentation of regular handwritten document images.
arXiv Detail & Related papers (2021-01-20T19:51:26Z) - Text line extraction using fully convolutional network and energy
minimization [0.0]
This paper proposes to use a fully convolutional network for text line detection and energy minimization.
We evaluate the proposed method on VML-AHTE, VML-MOC, and Diva-HisDB datasets.
arXiv Detail & Related papers (2021-01-18T23:23:03Z) - OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page
Text Recognition by learning to unfold [6.09170287691728]
We take a step from segmentation-free single line recognition towards segmentation-free multi-line / full page recognition.
We propose a novel and simple neural network module, termed textbfOrigamiNet, that can augment any CTC-trained, fully convolutional single line text recognizer.
We achieve state-of-the-art character error rate on both IAM & ICDAR 2017 HTR benchmarks for handwriting recognition, surpassing all other methods in the literature.
arXiv Detail & Related papers (2020-06-12T22:18:02Z) - Unsupervised deep learning for text line segmentation [0.0]
A common method is to train a deep learning network for embedding the document image into an image of blob lines that are tracing the text lines.
This paper presents an unsupervised embedding of document image patches without a need for annotations.
We show that the outliers do not harm the convergence and the network learns to discriminate the text lines from the spaces between text lines.
arXiv Detail & Related papers (2020-03-19T08:57:53Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.