TransDocs: Optical Character Recognition with word to word translation
- URL: http://arxiv.org/abs/2304.07637v1
- Date: Sat, 15 Apr 2023 21:40:14 GMT
- Title: TransDocs: Optical Character Recognition with word to word translation
- Authors: Abhishek Bamotra, Phani Krishna Uppala
- Abstract summary: This research work focuses on improving the optical character recognition (OCR) with ML techniques.
This work is based on ANKI dataset for English to Spanish translation.
- Score: 2.2336243882030025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While OCR has been used in various applications, its output is not always
accurate, leading to misfit words. This research work focuses on improving the
optical character recognition (OCR) with ML techniques with integration of OCR
with long short-term memory (LSTM) based sequence to sequence deep learning
models to perform document translation. This work is based on ANKI dataset for
English to Spanish translation. In this work, I have shown comparative study
for pre-trained OCR while using deep learning model using LSTM-based seq2seq
architecture with attention for machine translation. End-to-end performance of
the model has been expressed in BLEU-4 score. This research paper is aimed at
researchers and practitioners interested in OCR and its applications in
document translation.
Related papers
- CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models [0.0]
This paper introduces Context Leveraging OCR Correction (CLOCR-C)
It uses the infilling and context-adaptive abilities of transformer-based language models (LMs) to improve OCR quality.
The study aims to determine if LMs can perform post-OCR correction, improve downstream NLP tasks, and the value of providing socio-cultural context as part of the correction process.
arXiv Detail & Related papers (2024-08-30T17:26:05Z) - Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation [0.0]
This study explores the transfer learning capabilities of the TrOCR architecture to Spanish.
We integrate an English TrOCR encoder with a language specific decoder and train the model on this specific language.
Fine-tuning the English TrOCR on Spanish yields superior recognition than the language specific decoder for a fixed dataset size.
arXiv Detail & Related papers (2024-07-09T15:31:41Z) - LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation [67.24113079928668]
We present LexMatcher, a method for data curation driven by the coverage of senses found in bilingual dictionaries.
Our approach outperforms the established baselines on the WMT2022 test sets.
arXiv Detail & Related papers (2024-06-03T15:30:36Z) - Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and
In-depth Evaluation [33.66939971907121]
The evaluation reveals that GPT-4V performs well in recognizing and understanding Latin contents, but struggles with multilingual scenarios and complex tasks.
In general, despite its versatility in handling diverse OCR tasks, GPT-4V does not outperform existing state-of-the-art OCR models.
arXiv Detail & Related papers (2023-10-25T17:38:55Z) - User-Centric Evaluation of OCR Systems for Kwak'wala [92.73847703011353]
We show that utilizing OCR reduces the time spent in the manual transcription of culturally valuable documents by over 50%.
Our results demonstrate the potential benefits that OCR tools can have on downstream language documentation and revitalization efforts.
arXiv Detail & Related papers (2023-02-26T21:41:15Z) - Transferring General Multimodal Pretrained Models to Text Recognition [46.33867696799362]
We recast text recognition as image captioning and directly transfer a unified vision-language pretrained model to the end task.
We construct an OCR pipeline with OFA-OCR, and we demonstrate that it can achieve competitive performance with the product-level API.
arXiv Detail & Related papers (2022-12-19T08:30:42Z) - Understanding Translationese in Cross-Lingual Summarization [106.69566000567598]
Cross-lingual summarization (MS) aims at generating a concise summary in a different target language.
To collect large-scale CLS data, existing datasets typically involve translation in their creation.
In this paper, we first confirm that different approaches of constructing CLS datasets will lead to different degrees of translationese.
arXiv Detail & Related papers (2022-12-14T13:41:49Z) - OCR Improves Machine Translation for Low-Resource Languages [10.010595434359647]
We introduce and make publicly available a novel benchmark, textscOCR4MT, consisting of real and synthetic data, enriched with noise.
We evaluate state-of-the-art OCR systems on our benchmark and analyse most common errors.
We then perform an ablation study to investigate how OCR errors impact Machine Translation performance.
arXiv Detail & Related papers (2022-02-27T02:36:45Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Neural Model Reprogramming with Similarity Based Mapping for
Low-Resource Spoken Command Recognition [71.96870151495536]
We propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR)
The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model.
We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech.
arXiv Detail & Related papers (2021-10-08T05:07:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.