Text Detection Forgot About Document OCR
- URL: http://arxiv.org/abs/2210.07903v1
- Date: Fri, 14 Oct 2022 15:37:54 GMT
- Title: Text Detection Forgot About Document OCR
- Authors: Krzysztof Olejniczak and Milan \v{S}ulc
- Abstract summary: This paper compares several methods designed for in-the-wild text recognition and for document text recognition.
The results suggest that state-of-the-art methods originally proposed for in-the-wild text detection also achieve excellent results on document text detection.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detection and recognition of text from scans and other images, commonly
denoted as Optical Character Recognition (OCR), is a widely used form of
automated document processing with a number of methods available. Advances in
machine learning enabled even more challenging scenarios of text detection and
recognition "in-the-wild" - such as detecting text on objects from photographs
of complex scenes. While the state-of-the-art methods for in-the-wild text
recognition are typically evaluated on complex scenes, their performance in the
domain of documents has not been published. This paper compares several methods
designed for in-the-wild text recognition and for document text recognition,
and provides their evaluation on the domain of structured documents. The
results suggest that state-of-the-art methods originally proposed for
in-the-wild text detection also achieve excellent results on document text
detection, outperforming available OCR methods. We argue that the application
of document OCR should not be omitted in evaluation of text detection and
recognition methods.
Related papers
- UNIT: Unifying Image and Text Recognition in One Vision Encoder [51.140564856352825]
UNIT is a novel training framework aimed at UNifying Image and Text recognition within a single model.
We show that UNIT significantly outperforms existing methods on document-related tasks.
Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment.
arXiv Detail & Related papers (2024-09-06T08:02:43Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Optimization of Image Processing Algorithms for Character Recognition in
Cultural Typewritten Documents [0.8158530638728501]
This paper evaluates the impact of image processing methods and parameter tuning in Optical Character Recognition (OCR)
The approach uses a multi-objective problem formulation to minimize Levenshtein edit distance and maximize the number of words correctly identified with a non-dominated sorting genetic algorithm (NSGA-II)
Our findings suggest that employing image pre-processing algorithms in OCR might be more suitable for typologies where the text recognition task without pre-processing does not produce good results.
arXiv Detail & Related papers (2023-11-27T11:44:46Z) - Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting
Annotated Bounding Boxes via Reinforcement Learning [41.56134008044702]
Box is a reinforcement learning-based method for adjusting the shape of each text bounding box to make it more compatible with text recognition models.
Experiments demonstrate that the performance of end-to-end text recognition systems can be improved when using the adjusted bounding boxes as the ground truths for training.
arXiv Detail & Related papers (2022-07-25T06:58:45Z) - Reading and Writing: Discriminative and Generative Modeling for
Self-Supervised Text Recognition [101.60244147302197]
We introduce contrastive learning and masked image modeling to learn discrimination and generation of text images.
Our method outperforms previous self-supervised text recognition methods by 10.2%-20.2% on irregular scene text recognition datasets.
Our proposed text recognizer exceeds previous state-of-the-art text recognition methods by averagely 5.3% on 11 benchmarks, with similar model size.
arXiv Detail & Related papers (2022-07-01T03:50:26Z) - Open Set Classification of Untranscribed Handwritten Documents [56.0167902098419]
Huge amounts of digital page images of important manuscripts are preserved in archives worldwide.
The class or typology'' of a document is perhaps the most important tag to be included in the metadata.
The technical problem is one of automatic classification of documents, each consisting of a set of untranscribed handwritten text images.
arXiv Detail & Related papers (2022-06-20T20:43:50Z) - Detection Masking for Improved OCR on Noisy Documents [8.137198664755596]
We present an improved detection network with a masking system to improve the quality of OCR performed on documents.
We perform a unified evaluation on a publicly available dataset demonstrating the usefulness and broad applicability of our method.
arXiv Detail & Related papers (2022-05-17T11:59:18Z) - TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped
scene text [23.04601165885908]
We propose TextOCR, an arbitrary-shaped scene text detection and recognition with 900k annotated words collected on real images.
We show that current state-of-the-art text-recognition (OCR) models fail to perform well on TextOCR.
We use a TextOCR trained OCR model to create PixelM4C model which can do scene text based reasoning on an image in an end-to-end fashion.
arXiv Detail & Related papers (2021-05-12T07:50:42Z) - Scene Text Retrieval via Joint Text Detection and Similarity Learning [68.24531728554892]
Scene text retrieval aims to localize and search all text instances from an image gallery, which are the same or similar to a given query text.
We address this problem by directly learning a cross-modal similarity between a query text and each text instance from natural images.
In this way, scene text retrieval can be simply performed by ranking the detected text instances with the learned similarity.
arXiv Detail & Related papers (2021-04-04T07:18:38Z) - Fused Text Recogniser and Deep Embeddings Improve Word Recognition and
Retrieval [26.606946401967804]
We fuse the noisy output of text recogniser with a deep embeddings representation derived out of the entire word.
We improve word recognition rate by 1.4 and retrieval by 11.13 in the mAP.
arXiv Detail & Related papers (2020-07-01T00:55:34Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.