Related papers: Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation

Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation

URL: http://arxiv.org/abs/2309.03072v1
Date: Wed, 6 Sep 2023 15:19:04 GMT
Title: Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation
Authors: Michael Jungo, Beat Wolf, Andrii Maksai, Claudiu Musat and Andreas Fischer
Abstract summary: We focus on the scenario where the transcription is known beforehand, in which case the character segmentation becomes an assignment problem. Inspired by the $k$-means clustering algorithm, we view it from the perspective of cluster assignment and present a Transformer-based architecture. In order to assess the quality of our approach, we create character segmentation ground truths for two popular on-line handwriting datasets.
Score: 4.128716153761773
License: http://creativecommons.org/licenses/by/4.0/
Abstract: On-line handwritten character segmentation is often associated with handwriting recognition and even though recognition models include mechanisms to locate relevant positions during the recognition process, it is typically insufficient to produce a precise segmentation. Decoupling the segmentation from the recognition unlocks the potential to further utilize the result of the recognition. We specifically focus on the scenario where the transcription is known beforehand, in which case the character segmentation becomes an assignment problem between sampling points of the stylus trajectory and characters in the text. Inspired by the $k$-means clustering algorithm, we view it from the perspective of cluster assignment and present a Transformer-based architecture where each cluster is formed based on a learned character query in the Transformer decoder block. In order to assess the quality of our approach, we create character segmentation ground truths for two popular on-line handwriting datasets, IAM-OnDB and HANDS-VNOnDB, and evaluate multiple methods on them, demonstrating that our approach achieves the overall best results.

Related papers

General Detection-based Text Line Recognition [15.761142324480165]
We introduce a general detection-based approach to text line recognition, be it printed (OCR) or handwritten (HTR) Our approach builds on a completely different paradigm than state-of-the-art HTR methods, which rely on autoregressive decoding. We improve state-of-the-art performances for Chinese script recognition on the CASIA v2 dataset, and for cipher recognition on the Borg and Copiale datasets.
arXiv Detail & Related papers (2024-09-25T17:05:55Z)
Attention based End to end network for Offline Writer Identification on Word level data [3.5829161769306244]
We propose a writer identification system based on an attention-driven Convolutional Neural Network (CNN) The system is trained utilizing image segments, known as fragments, extracted from word images, employing a pyramid-based strategy. The efficacy of the proposed algorithm is evaluated on three benchmark databases.
arXiv Detail & Related papers (2024-04-11T09:41:14Z)
Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation [59.37587762543934]
This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS) Existing methods suffer from a granularity inconsistency regarding the usage of group tokens. We propose the prototypical guidance network (PGSeg) that incorporates multi-modal regularization.
arXiv Detail & Related papers (2023-10-29T13:18:00Z)
Multiview Identifiers Enhanced Generative Retrieval [78.38443356800848]
generative retrieval generates identifier strings of passages as the retrieval target. We propose a new type of identifier, synthetic identifiers, that are generated based on the content of a passage. Our proposed approach performs the best in generative retrieval, demonstrating its effectiveness and robustness.
arXiv Detail & Related papers (2023-05-26T06:50:21Z)
Attributable and Scalable Opinion Summarization [79.87892048285819]
We generate abstractive summaries by decoding frequent encodings, and extractive summaries by selecting the sentences assigned to the same frequent encodings. Our method is attributable, because the model identifies sentences used to generate the summary as part of the summarization process. It scales easily to many hundreds of input reviews, because aggregation is performed in the latent space rather than over long sequences of tokens.
arXiv Detail & Related papers (2023-05-19T11:30:37Z)
Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs) We compare their accuracy and performance on widely used public datasets of scene and handwritten text. Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z)
Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching [60.8427677151492]
We propose CMatch, a Character-level distribution matching method to perform fine-grained adaptation between each character in two domains. Experiments on the Libri-Adapt dataset show that our proposed approach achieves 14.39% and 16.50% relative Word Error Rate (WER) reduction on both cross-device and cross-environment ASR.
arXiv Detail & Related papers (2021-04-15T14:36:54Z)
A Skip-connected Multi-column Network for Isolated Handwritten Bangla Character and Digit recognition [12.551285203114723]
We have proposed a non-explicit feature extraction method using a multi-scale multi-column skip convolutional neural network. Our method is evaluated on four publicly available datasets of isolated handwritten Bangla characters and digits.
arXiv Detail & Related papers (2020-04-27T13:18:58Z)
TextScanner: Reading Characters in Order for Robust Scene Text Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition. It generates pixel-wise, multi-channel segmentation maps for character class, position and order. It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.