A Multiplexed Network for End-to-End, Multilingual OCR
- URL: http://arxiv.org/abs/2103.15992v1
- Date: Mon, 29 Mar 2021 23:53:49 GMT
- Title: A Multiplexed Network for End-to-End, Multilingual OCR
- Authors: Jing Huang, Guan Pang, Rama Kovvuri, Mandy Toh, Kevin J Liang, Praveen
Krishnan, Xi Yin, Tal Hassner
- Abstract summary: We propose an E2E approach, Multiplexed Multilingual Mask TextSpotter, that performs script identification at the word level and handles different scripts with different recognition heads.
Experiments show that our method outperforms the single-head model with similar number of parameters in end-to-end recognition tasks.
We believe that our work is a step towards the end-to-end trainable and scalable multilingual multi-purpose OCR system.
- Score: 20.818532124822713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in OCR have shown that an end-to-end (E2E) training pipeline
that includes both detection and recognition leads to the best results.
However, many existing methods focus primarily on Latin-alphabet languages,
often even only case-insensitive English characters. In this paper, we propose
an E2E approach, Multiplexed Multilingual Mask TextSpotter, that performs
script identification at the word level and handles different scripts with
different recognition heads, all while maintaining a unified loss that
simultaneously optimizes script identification and multiple recognition heads.
Experiments show that our method outperforms the single-head model with similar
number of parameters in end-to-end recognition tasks, and achieves
state-of-the-art results on MLT17 and MLT19 joint text detection and script
identification benchmarks. We believe that our work is a step towards the
end-to-end trainable and scalable multilingual multi-purpose OCR system. Our
code and model will be released.
Related papers
- Out of Length Text Recognition with Sub-String Matching [54.63761108308825]
In this paper, we term this the Out of Length (OOL) text recognition.
We propose a novel method called OOL Text Recognition with sub-String Matching (SMTR)
SMTR comprises two cross-attention-based modules: one encodes a sub-string containing multiple characters into next and previous queries, and the other employs the queries to attend to the image features, matching the sub-string and simultaneously recognizing its next and previous character.
arXiv Detail & Related papers (2024-07-17T05:02:17Z) - Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through
Image-IDS Aligning [61.34060587461462]
We propose a two-stage framework for Chinese Text Recognition (CTR)
We pre-train a CLIP-like model through aligning printed character images and Ideographic Description Sequences (IDS)
This pre-training stage simulates humans recognizing Chinese characters and obtains the canonical representation of each character.
The learned representations are employed to supervise the CTR model, such that traditional single-character recognition can be improved to text-line recognition.
arXiv Detail & Related papers (2023-09-03T05:33:16Z) - On the Hidden Mystery of OCR in Large Multimodal Models [133.09809647230475]
We conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks.
Our study encompasses 29 datasets, making it the most comprehensive OCR evaluation benchmark available.
arXiv Detail & Related papers (2023-05-13T11:28:37Z) - CSSL-MHTR: Continual Self-Supervised Learning for Scalable Multi-script Handwritten Text Recognition [16.987008461171065]
We explore the potential of continual self-supervised learning to alleviate the catastrophic forgetting problem in handwritten text recognition.
Our method consists in adding intermediate layers called adapters for each task, and efficiently distilling knowledge from the previous model while learning the current task.
We attain state-of-the-art performance on English, Italian and Russian scripts, whilst adding only a few parameters per task.
arXiv Detail & Related papers (2023-03-16T14:27:45Z) - Task Grouping for Multilingual Text Recognition [28.036892501896983]
We propose an automatic method for multilingual text recognition with a task grouping and assignment module using Gumbel-Softmax.
Experiments on MLT19 lend evidence to our hypothesis that there is a middle ground between combining every task together and separating every task that achieves a better configuration of task grouping/separation.
arXiv Detail & Related papers (2022-10-13T23:54:23Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Multi-script Handwritten Digit Recognition Using Multi-task Learning [2.8698937226234795]
It is not very common for multi-script digit recognition which encourage the development of robust and multipurpose systems.
In this study multi-script handwritten digit recognition using multi-task learning will be investigated.
The handwritten digits of three scripts including Latin, Arabic and Kannada are studied to show that multi-task models with reformulation of the individual tasks have shown promising results.
arXiv Detail & Related papers (2021-06-15T16:30:37Z) - Implicit Feature Alignment: Learn to Convert Text Recognizer to Text
Spotter [38.4211220941874]
We propose a simple, elegant and effective paradigm called Implicit Feature Alignment (IFA)
IFA can be easily integrated into current text recognizers, resulting in a novel inference mechanism called IFAinference.
We experimentally demonstrate that IFA achieves state-of-the-art performance on end-to-end document recognition tasks.
arXiv Detail & Related papers (2021-06-10T17:06:28Z) - Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual
Retrieval [51.60862829942932]
We present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
For sentence-level CLIR, we demonstrate that state-of-the-art performance can be achieved.
However, the peak performance is not met using the general-purpose multilingual text encoders off-the-shelf', but rather relying on their variants that have been further specialized for sentence understanding tasks.
arXiv Detail & Related papers (2021-01-21T00:15:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.