Related papers: Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval

Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval

URL: http://arxiv.org/abs/2007.00166v1
Date: Wed, 1 Jul 2020 00:55:34 GMT
Title: Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval
Authors: Siddhant Bansal, Praveen Krishnan, C.V. Jawahar
Abstract summary: We fuse the noisy output of text recogniser with a deep embeddings representation derived out of the entire word. We improve word recognition rate by 1.4 and retrieval by 11.13 in the mAP.
Score: 26.606946401967804
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recognition and retrieval of textual content from the large document collections have been a powerful use case for the document image analysis community. Often the word is the basic unit for recognition as well as retrieval. Systems that rely only on the text recogniser (OCR) output are not robust enough in many situations, especially when the word recognition rates are poor, as in the case of historic documents or digital libraries. An alternative has been word spotting based methods that retrieve/match words based on a holistic representation of the word. In this paper, we fuse the noisy output of text recogniser with a deep embeddings representation derived out of the entire word. We use average and max fusion for improving the ranked results in the case of retrieval. We validate our methods on a collection of Hindi documents. We improve word recognition rate by 1.4 and retrieval by 11.13 in the mAP.

Related papers

Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs. We propose a more realistic setting in which only noisy text and its NER labels are available. We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z)
Unifying Multimodal Retrieval via Document Screenshot Embedding [92.03571344075607]
Document Screenshot Embedding (DSE) is a novel retrieval paradigm that regards document screenshots as a unified input format. We first craft the dataset of Wiki-SS, a 1.3M Wikipedia web page screenshots as the corpus to answer the questions from the Natural Questions dataset. In such a text-intensive document retrieval setting, DSE shows competitive effectiveness compared to other text retrieval methods relying on parsing.
arXiv Detail & Related papers (2024-06-17T06:27:35Z)
Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models. We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning. Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z)
Natural Logic-guided Autoregressive Multi-hop Document Retrieval for Fact Verification [21.04611844009438]
We propose a novel retrieve-and-rerank method for multi-hop retrieval. It consists of a retriever that jointly scores documents in the knowledge source and sentences from previously retrieved documents. It is guided by a proof system that dynamically terminates the retrieval process if the evidence is deemed sufficient.
arXiv Detail & Related papers (2022-12-10T11:32:38Z)
Text Detection Forgot About Document OCR [0.0]
This paper compares several methods designed for in-the-wild text recognition and for document text recognition. The results suggest that state-of-the-art methods originally proposed for in-the-wild text detection also achieve excellent results on document text detection.
arXiv Detail & Related papers (2022-10-14T15:37:54Z)
Open Set Classification of Untranscribed Handwritten Documents [56.0167902098419]
Huge amounts of digital page images of important manuscripts are preserved in archives worldwide. The class or typology'' of a document is perhaps the most important tag to be included in the metadata. The technical problem is one of automatic classification of documents, each consisting of a set of untranscribed handwritten text images.
arXiv Detail & Related papers (2022-06-20T20:43:50Z)
Spell my name: keyword boosted speech recognition [25.931897154065663]
uncommon words such as names and technical terminology are important to understanding conversations in context. We propose a simple but powerful ASR decoding method that can better recognise these uncommon keywords. The method boosts the probabilities of given keywords in a beam search based on acoustic model predictions. We demonstrate the effectiveness of our method on the LibriSpeeech test sets and also internal data of real-world conversations.
arXiv Detail & Related papers (2021-10-06T14:16:57Z)
Asking questions on handwritten document collections [35.85762649504866]
This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult.
arXiv Detail & Related papers (2021-10-02T02:40:40Z)
On Vocabulary Reliance in Scene Text Recognition [79.21737876442253]
Methods perform well on images with words within vocabulary but generalize poorly to images with words outside vocabulary. We call this phenomenon "vocabulary reliance" We propose a simple yet effective mutual learning strategy to allow models of two families to learn collaboratively.
arXiv Detail & Related papers (2020-05-08T11:16:58Z)
Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems [54.49880724137688]
The problem of out of vocabulary words (OOV) is typical for any speech recognition system. One of the popular approach to cover OOVs is to use subword units rather then words. In this paper we explore different existing methods of this solution on both graph construction and search method levels.
arXiv Detail & Related papers (2020-03-19T21:24:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.