CALText: Contextual Attention Localization for Offline Handwritten Text
- URL: http://arxiv.org/abs/2111.03952v1
- Date: Sat, 6 Nov 2021 19:54:21 GMT
- Title: CALText: Contextual Attention Localization for Offline Handwritten Text
- Authors: Tayaba Anjum and Nazar Khan
- Abstract summary: We present an attention based encoder-decoder model that learns to read Urdu in context.
A novel localization penalty is introduced to encourage the model to attend only one location at a time when recognizing the next character.
We evaluate the model on both Urdu and Arabic datasets and show that contextual attention localization outperforms both simple attention and multi-directional LSTM models.
- Score: 1.066048003460524
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recognition of Arabic-like scripts such as Persian and Urdu is more
challenging than Latin-based scripts. This is due to the presence of a
two-dimensional structure, context-dependent character shapes, spaces and
overlaps, and placement of diacritics. Not much research exists for offline
handwritten Urdu script which is the 10th most spoken language in the world. We
present an attention based encoder-decoder model that learns to read Urdu in
context. A novel localization penalty is introduced to encourage the model to
attend only one location at a time when recognizing the next character. In
addition, we comprehensively refine the only complete and publicly available
handwritten Urdu dataset in terms of ground-truth annotations. We evaluate the
model on both Urdu and Arabic datasets and show that contextual attention
localization outperforms both simple attention and multi-directional LSTM
models.
Related papers
- KhmerST: A Low-Resource Khmer Scene Text Detection and Recognition Benchmark [1.5409800688911346]
We introduce the first Khmer scene-text dataset, featuring 1,544 expert-annotated images.
This diverse dataset includes flat text, raised text, poorly illuminated text, distant polygon and partially obscured text.
arXiv Detail & Related papers (2024-10-23T21:04:24Z) - Script-Agnostic Language Identification [21.19710835737713]
Many modern languages, such as Konkani, Kashmiri, Punjabi etc., are synchronically written in several scripts.
We propose learning script-agnostic representations using several different experimental strategies.
We find that word-level script randomization and exposure to a language written in multiple scripts is extremely valuable for downstream script-agnostic language identification.
arXiv Detail & Related papers (2024-06-25T19:23:42Z) - CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving [61.73180469072787]
We focus on the problem of spoken translation (ST) of code-switched speech in Indian languages to English text.
We present a new end-to-end model architecture COSTA that scaffolds on pretrained automatic speech recognition (ASR) and machine translation (MT) modules.
COSTA significantly outperforms many competitive cascaded and end-to-end multimodal baselines by up to 3.5 BLEU points.
arXiv Detail & Related papers (2024-06-16T16:10:51Z) - Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering [50.52792174648067]
This initiative seeks to bridge the gap between textual and visual comprehension.
We propose a new multi-task Urdu scene text dataset comprising over 1000 natural scene images.
We provide fine-grained annotations for text instances, addressing the limitations of previous datasets.
arXiv Detail & Related papers (2024-05-21T06:48:26Z) - The First Swahili Language Scene Text Detection and Recognition Dataset [55.83178123785643]
There is a significant gap in low-resource languages, especially the Swahili Language.
Swahili is widely spoken in East African countries but is still an under-explored language in scene text recognition.
We propose a comprehensive dataset of Swahili scene text images and evaluate the dataset on different scene text detection and recognition models.
arXiv Detail & Related papers (2024-05-19T03:55:02Z) - Share What You Already Know: Cross-Language-Script Transfer and
Alignment for Sentiment Detection in Code-Mixed Data [0.0]
Code-switching entails mixing multiple languages. It is an increasingly occurring phenomenon in social media texts.
Pre-trained multilingual models primarily utilize the data in the native script of the language.
Using the native script for each language can generate better representations of the text owing to the pre-trained knowledge.
arXiv Detail & Related papers (2024-02-07T02:59:18Z) - NusaWrites: Constructing High-Quality Corpora for Underrepresented and
Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages.
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.
Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z) - Beyond Arabic: Software for Perso-Arabic Script Manipulation [67.31374614549237]
We provide a set of finite-state transducer (FST) components and corresponding utilities for manipulating the writing systems of languages that use the Perso-Arabic script.
The library also provides simple FST-based romanization and transliteration.
arXiv Detail & Related papers (2023-01-26T20:37:03Z) - Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z) - Efficient Urdu Caption Generation using Attention based LSTM [0.0]
Urdu is the national language of Pakistan and also much spoken and understood in the sub-continent region of Pakistan-India.
We develop an attention-based deep learning model using techniques of sequence modeling specialized for the Urdu language.
We evaluate our proposed technique on this dataset and show that it can achieve a BLEU score of 0.83 in the Urdu language.
arXiv Detail & Related papers (2020-08-02T17:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.