Automated Transcription for Pre-Modern Japanese Kuzushiji Documents by
Random Lines Erasure and Curriculum Learning
- URL: http://arxiv.org/abs/2005.02669v1
- Date: Wed, 6 May 2020 09:17:28 GMT
- Title: Automated Transcription for Pre-Modern Japanese Kuzushiji Documents by
Random Lines Erasure and Curriculum Learning
- Authors: Anh Duc Le
- Abstract summary: Most of the previous methods divided the recognition process into character segmentation and recognition.
In this paper, we enlarge our previous humaninspired recognition system from multiple lines to the full-page of Kuzushiji documents.
For the lack of training data, we propose a random text line erasure approach that randomly erases text lines and distorts documents.
- Score: 6.700873164609009
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognizing the full-page of Japanese historical documents is a challenging
problem due to the complex layout/background and difficulty of writing styles,
such as cursive and connected characters. Most of the previous methods divided
the recognition process into character segmentation and recognition. However,
those methods provide only character bounding boxes and classes without text
transcription. In this paper, we enlarge our previous humaninspired recognition
system from multiple lines to the full-page of Kuzushiji documents. The
human-inspired recognition system simulates human eye movement during the
reading process. For the lack of training data, we propose a random text line
erasure approach that randomly erases text lines and distorts documents. For
the convergence problem of the recognition system for fullpage documents, we
employ curriculum learning that trains the recognition system step by step from
the easy level (several text lines of documents) to the difficult level
(full-page documents). We tested the step training approach and random text
line erasure approach on the dataset of the Kuzushiji recognition competition
on Kaggle. The results of the experiments demonstrate the effectiveness of our
proposed approaches. These results are competitive with other participants of
the Kuzushiji recognition competition.
Related papers
- Text-Only Training for Visual Storytelling [107.19873669536523]
We formulate visual storytelling as a visual-conditioned story generation problem.
We propose a text-only training method that separates the learning of cross-modality alignment and story generation.
arXiv Detail & Related papers (2023-08-17T09:32:17Z) - Looking and Listening: Audio Guided Text Recognition [62.98768236858089]
Text recognition in the wild is a long-standing problem in computer vision.
Recent studies suggest vision and language processing are effective for scene text recognition.
Yet, solving edit errors such as add, delete, or replace is still the main challenge for existing approaches.
We propose the AudioOCR, a simple yet effective probabilistic audio decoder for mel spectrogram sequence prediction.
arXiv Detail & Related papers (2023-06-06T08:08:18Z) - CSSL-MHTR: Continual Self-Supervised Learning for Scalable Multi-script Handwritten Text Recognition [16.987008461171065]
We explore the potential of continual self-supervised learning to alleviate the catastrophic forgetting problem in handwritten text recognition.
Our method consists in adding intermediate layers called adapters for each task, and efficiently distilling knowledge from the previous model while learning the current task.
We attain state-of-the-art performance on English, Italian and Russian scripts, whilst adding only a few parameters per task.
arXiv Detail & Related papers (2023-03-16T14:27:45Z) - Towards End-to-end Handwritten Document Recognition [0.0]
Handwritten text recognition has been widely studied in the last decades for its numerous applications.
In this thesis, we propose to tackle these issues by performing the handwritten text recognition of whole document in an end-to-end way.
We reached state-of-the-art results at paragraph level on the RIMES 2011, IAM and READ 2016 datasets and outperformed the line-level state of the art on these datasets.
arXiv Detail & Related papers (2022-09-30T10:31:22Z) - Reading and Writing: Discriminative and Generative Modeling for
Self-Supervised Text Recognition [101.60244147302197]
We introduce contrastive learning and masked image modeling to learn discrimination and generation of text images.
Our method outperforms previous self-supervised text recognition methods by 10.2%-20.2% on irregular scene text recognition datasets.
Our proposed text recognizer exceeds previous state-of-the-art text recognition methods by averagely 5.3% on 11 benchmarks, with similar model size.
arXiv Detail & Related papers (2022-07-01T03:50:26Z) - KOHTD: Kazakh Offline Handwritten Text Dataset [0.0]
We propose an extensive Kazakh offline Handwritten Text dataset (KOHTD)
KOHTD has 3000 handwritten exam papers and more than 140335 segmented images and there are approximately 922010 symbols.
We used a variety of popular text recognition methods for word and line recognition in our studies, including CTC-based and attention-based methods.
arXiv Detail & Related papers (2021-09-22T16:19:38Z) - Robust Handwriting Recognition with Limited and Noisy Data [7.617456558732551]
We focus on learning handwritten characters from maintenance logs, a constrained setting where data is very limited and noisy.
We break the problem into two consecutive stages of word segmentation and word recognition respectively and utilize data augmentation techniques to train both stages.
Our system achieves a lower error rate and is more suited to handle noisy and difficult documents.
arXiv Detail & Related papers (2020-08-18T20:33:23Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z) - Separating Content from Style Using Adversarial Learning for Recognizing
Text in the Wild [103.51604161298512]
We propose an adversarial learning framework for the generation and recognition of multiple characters in an image.
Our framework can be integrated into recent recognition methods to achieve new state-of-the-art recognition accuracy.
arXiv Detail & Related papers (2020-01-13T12:41:42Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.