Efficient Urdu Caption Generation using Attention based LSTM
- URL: http://arxiv.org/abs/2008.01663v4
- Date: Sat, 19 Jun 2021 15:31:55 GMT
- Title: Efficient Urdu Caption Generation using Attention based LSTM
- Authors: Inaam Ilahi, Hafiz Muhammad Abdullah Zia, Muhammad Ahtazaz Ahsan, Rauf
Tabassam, Armaghan Ahmed
- Abstract summary: Urdu is the national language of Pakistan and also much spoken and understood in the sub-continent region of Pakistan-India.
We develop an attention-based deep learning model using techniques of sequence modeling specialized for the Urdu language.
We evaluate our proposed technique on this dataset and show that it can achieve a BLEU score of 0.83 in the Urdu language.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in deep learning have created many opportunities to solve
real-world problems that remained unsolved for more than a decade. Automatic
caption generation is a major research field, and the research community has
done a lot of work on it in most common languages like English. Urdu is the
national language of Pakistan and also much spoken and understood in the
sub-continent region of Pakistan-India, and yet no work has been done for Urdu
language caption generation. Our research aims to fill this gap by developing
an attention-based deep learning model using techniques of sequence modeling
specialized for the Urdu language. We have prepared a dataset in the Urdu
language by translating a subset of the "Flickr8k" dataset containing 700 'man'
images. We evaluate our proposed technique on this dataset and show that it can
achieve a BLEU score of 0.83 in the Urdu language. We improve on the previous
state-of-the-art by using better CNN architectures and optimization techniques.
Furthermore, we provide a discussion on how the generated captions can be made
correct grammar-wise.
Related papers
- Navigating Text-to-Image Generative Bias across Indic Languages [53.92640848303192]
This research investigates biases in text-to-image (TTI) models for the Indic languages widely spoken across India.
It evaluates and compares the generative performance and cultural relevance of leading TTI models in these languages against their performance in English.
arXiv Detail & Related papers (2024-08-01T04:56:13Z) - The First Swahili Language Scene Text Detection and Recognition Dataset [55.83178123785643]
There is a significant gap in low-resource languages, especially the Swahili Language.
Swahili is widely spoken in East African countries but is still an under-explored language in scene text recognition.
We propose a comprehensive dataset of Swahili scene text images and evaluate the dataset on different scene text detection and recognition models.
arXiv Detail & Related papers (2024-05-19T03:55:02Z) - Tamil-Llama: A New Tamil Language Model Based on Llama 2 [6.449795539095749]
This paper enhances the open-source LLaMA model with an addition of 16,000 Tamil tokens, aiming to achieve superior text generation and comprehension in Tamil language.
We strategically employ the LoRA methodology for efficient model training on a comprehensive Tamil corpus, ensuring computational feasibility and model robustness.
Our results showcase significant performance improvements in Tamil text generation, with potential implications for the broader landscape of Large Language Models in Indian languages.
arXiv Detail & Related papers (2023-11-10T03:02:39Z) - NusaWrites: Constructing High-Quality Corpora for Underrepresented and
Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages.
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.
Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z) - Breaking Language Barriers: A Question Answering Dataset for Hindi and
Marathi [1.03590082373586]
This paper focuses on developing a Question Answering dataset for two such languages- Hindi and Marathi.
Despite Hindi being the 3rd most spoken language worldwide, and Marathi being the 11th most spoken language globally, both languages face limited resources for building efficient Question Answering systems.
We release the largest Question-Answering dataset available for these languages, with each dataset containing 28,000 samples.
arXiv Detail & Related papers (2023-08-19T00:39:21Z) - Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z) - No Language Left Behind: Scaling Human-Centered Machine Translation [69.28110770760506]
We create datasets and models aimed at narrowing the performance gap between low and high-resource languages.
We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks.
Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art.
arXiv Detail & Related papers (2022-07-11T07:33:36Z) - CALText: Contextual Attention Localization for Offline Handwritten Text [1.066048003460524]
We present an attention based encoder-decoder model that learns to read Urdu in context.
A novel localization penalty is introduced to encourage the model to attend only one location at a time when recognizing the next character.
We evaluate the model on both Urdu and Arabic datasets and show that contextual attention localization outperforms both simple attention and multi-directional LSTM models.
arXiv Detail & Related papers (2021-11-06T19:54:21Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Co-occurrences using Fasttext embeddings for word similarity tasks in
Urdu [0.0]
This paper builds a corpus for Urdu by scraping and integrating data from various sources.
We modify fasttext embeddings and N-Grams models to enable training them on our built corpus.
We have used these trained embeddings for a word similarity task and compared the results with existing techniques.
arXiv Detail & Related papers (2021-02-22T12:56:26Z) - An Augmented Translation Technique for low Resource language pair:
Sanskrit to Hindi translation [0.0]
In this work, Zero Shot Translation (ZST) is inspected for a low resource language pair.
The same architecture is tested for Sanskrit to Hindi translation for which data is sparse.
Dimensionality reduction of word embedding is performed to reduce the memory usage for data storage.
arXiv Detail & Related papers (2020-06-09T17:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.