An Evaluation of OCR on Egocentric Data
- URL: http://arxiv.org/abs/2206.05496v1
- Date: Sat, 11 Jun 2022 10:37:20 GMT
- Title: An Evaluation of OCR on Egocentric Data
- Authors: Valentin Popescu, Dima Damen, Toby Perrett
- Abstract summary: In this paper, we evaluate state-of-the-art OCR methods on Egocentric data.
We demonstrate that existing OCR methods struggle with rotated text, which is frequently observed on objects being handled.
We introduce a simple rotate-and-merge procedure which can be applied to pre-trained OCR models that halves the normalized edit distance error.
- Score: 30.637021477342035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we evaluate state-of-the-art OCR methods on Egocentric data.
We annotate text in EPIC-KITCHENS images, and demonstrate that existing OCR
methods struggle with rotated text, which is frequently observed on objects
being handled. We introduce a simple rotate-and-merge procedure which can be
applied to pre-trained OCR models that halves the normalized edit distance
error. This suggests that future OCR attempts should incorporate rotation into
model design and training procedures.
Related papers
- Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter.
The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates.
The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z) - Data Generation for Post-OCR correction of Cyrillic handwriting [41.94295877935867]
This paper focuses on the development and application of a synthetic handwriting generation engine based on B'ezier curves.
Such an engine generates highly realistic handwritten text in any amounts, which we utilize to create a substantial dataset.
We apply a Handwritten Text Recognition (HTR) model to this dataset to identify OCR errors, forming the basis for our POC model training.
arXiv Detail & Related papers (2023-11-27T15:01:26Z) - Enhancing OCR Performance through Post-OCR Models: Adopting Glyph
Embedding for Improved Correction [0.0]
The novelty of our approach lies in embedding the OCR output using CharBERT and our unique embedding technique, capturing the visual characteristics of characters.
Our findings show that post-OCR correction effectively addresses deficiencies in inferior OCR models, and glyph embedding enables the model to achieve superior results.
arXiv Detail & Related papers (2023-08-29T12:41:50Z) - Toward Zero-shot Character Recognition: A Gold Standard Dataset with
Radical-level Annotations [5.761679637905164]
In this paper, we construct an ancient Chinese character image dataset that contains both radical-level and character-level annotations.
To increase the adaptability of ACCID, we propose a splicing-based synthetic character algorithm to augment the training samples and apply an image denoising method to improve the image quality.
arXiv Detail & Related papers (2023-08-01T16:41:30Z) - Bayesian Inverse Contextual Reasoning for Heterogeneous Semantics-Native
Communication [47.9462619619438]
When agents do not share the same communication context, the effectiveness of contextual reasoning is compromised.
This article proposes a novel framework for solving the inverse problem of CR in SNC using two Bayesian inference methods.
arXiv Detail & Related papers (2023-06-10T10:10:55Z) - User-Centric Evaluation of OCR Systems for Kwak'wala [92.73847703011353]
We show that utilizing OCR reduces the time spent in the manual transcription of culturally valuable documents by over 50%.
Our results demonstrate the potential benefits that OCR tools can have on downstream language documentation and revitalization efforts.
arXiv Detail & Related papers (2023-02-26T21:41:15Z) - iOCR: Informed Optical Character Recognition for Election Ballot Tallies [13.343515845758398]
iOCR was developed with a spell correction algorithm to fix errors introduced by conventional OCR for vote tabulation.
The results found that the iOCR system outperforms conventional OCR techniques.
arXiv Detail & Related papers (2022-08-01T13:50:13Z) - Donut: Document Understanding Transformer without OCR [17.397447819420695]
We propose a novel VDU model that is end-to-end trainable without underpinning OCR framework.
Our approach achieves state-of-the-art performance on various document understanding tasks in public benchmark datasets and private industrial service datasets.
arXiv Detail & Related papers (2021-11-30T18:55:19Z) - Neural Model Reprogramming with Similarity Based Mapping for
Low-Resource Spoken Command Recognition [71.96870151495536]
We propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR)
The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model.
We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech.
arXiv Detail & Related papers (2021-10-08T05:07:35Z) - SSCR: Iterative Language-Based Image Editing via Self-Supervised
Counterfactual Reasoning [79.30956389694184]
Iterative Language-Based Image Editing (IL-BIE) tasks follow iterative instructions to edit images step by step.
Data scarcity is a significant issue for ILBIE as it is challenging to collect large-scale examples of images before and after instruction-based changes.
We introduce a Self-Supervised Counterfactual Reasoning framework that incorporates counterfactual thinking to overcome data scarcity.
arXiv Detail & Related papers (2020-09-21T01:45:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.