Urdu Poetry Generated by Using Deep Learning Techniques
- URL: http://arxiv.org/abs/2309.14233v1
- Date: Mon, 25 Sep 2023 15:44:24 GMT
- Title: Urdu Poetry Generated by Using Deep Learning Techniques
- Authors: Muhammad Shoaib Farooq, Ali Abbas
- Abstract summary: This study provides Urdu poetry generated using different deep-learning techniques and algorithms.
The data was collected through the Rekhta website, containing 1341 text files with several couplets.
- Score: 1.52292571922932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study provides Urdu poetry generated using different deep-learning
techniques and algorithms. The data was collected through the Rekhta website,
containing 1341 text files with several couplets. The data on poetry was not
from any specific genre or poet. Instead, it was a collection of mixed Urdu
poems and Ghazals. Different deep learning techniques, such as the model
applied Long Short-term Memory Networks (LSTM) and Gated Recurrent Unit (GRU),
have been used. Natural Language Processing (NLP) may be used in machine
learning to understand, analyze, and generate a language humans may use and
understand. Much work has been done on generating poetry for different
languages using different techniques. The collection and use of data were also
different for different researchers. The primary purpose of this project is to
provide a model that generates Urdu poems by using data completely, not by
sampling data. Also, this may generate poems in pure Urdu, not Roman Urdu, as
in the base paper. The results have shown good accuracy in the poems generated
by the model.
Related papers
- Measuring Non-Adversarial Reproduction of Training Data in Large Language Models [71.55350441396243]
We quantify the overlap between model responses and pretraining data when responding to natural and benign prompts.
We find that up to 15% of the text output by popular conversational language models overlaps with snippets from the Internet.
While appropriate prompting can reduce non-adversarial reproduction on average, we find that mitigating worst-case reproduction of training data requires stronger defenses.
arXiv Detail & Related papers (2024-11-15T14:55:01Z) - Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets [3.0040661953201475]
Large language models (LLMs) can now generate and recognize poetry.
We develop a task to evaluate how well LLMs recognize one aspect of English-language poetry.
We show that state-of-the-art LLMs can successfully identify both common and uncommon fixed poetic forms.
arXiv Detail & Related papers (2024-06-27T05:36:53Z) - AraPoemBERT: A Pretrained Language Model for Arabic Poetry Analysis [0.0]
We introduce AraPoemBERT, an Arabic language model pretrained exclusively on Arabic poetry text.
AraPoemBERT achieved unprecedented accuracy in two out of three novel tasks: poet's gender classification and poetry sub-meter classification.
The dataset used in this study contains more than 2.09 million verses collected from online sources.
arXiv Detail & Related papers (2024-03-19T02:59:58Z) - Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research [139.69207791947738]
Dolma is a three-trillion-token English corpus built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials.
We document Dolma, including its design principles, details about its construction, and a summary of its contents.
We present analyses and experimental results on intermediate states of Dolma to share what we have learned about important data curation practices.
arXiv Detail & Related papers (2024-01-31T20:29:50Z) - PoetryDiffusion: Towards Joint Semantic and Metrical Manipulation in
Poetry Generation [58.36105306993046]
Controllable text generation is a challenging and meaningful field in natural language generation (NLG)
In this paper, we pioneer the use of the Diffusion model for generating sonnets and Chinese SongCi poetry.
Our model outperforms existing models in automatic evaluation of semantic, metrical, and overall performance as well as human evaluation.
arXiv Detail & Related papers (2023-06-14T11:57:31Z) - Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models.
Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z) - Urdu & Hindi Poetry Generation using Neural Networks [0.0]
The purpose of this work is to give an ode to the Urdu, Hindi poets, and helping them start their next line of poetry.
A concern with creative works like this, especially in the literary context, is to ensure that the output is not plagiarized.
This work also addresses the concern and makes sure that the resulting odes are not exact match with input data.
arXiv Detail & Related papers (2021-07-16T16:12:51Z) - CCPM: A Chinese Classical Poetry Matching Dataset [50.90794811956129]
We propose a novel task to assess a model's semantic understanding of poetry by poem matching.
This task requires the model to select one line of Chinese classical poetry among four candidates according to the modern Chinese translation of a line of poetry.
To construct this dataset, we first obtain a set of parallel data of Chinese classical poetry and modern Chinese translation.
arXiv Detail & Related papers (2021-06-03T16:49:03Z) - Co-occurrences using Fasttext embeddings for word similarity tasks in
Urdu [0.0]
This paper builds a corpus for Urdu by scraping and integrating data from various sources.
We modify fasttext embeddings and N-Grams models to enable training them on our built corpus.
We have used these trained embeddings for a word similarity task and compared the results with existing techniques.
arXiv Detail & Related papers (2021-02-22T12:56:26Z) - MixPoet: Diverse Poetry Generation via Learning Controllable Mixed
Latent Space [79.70053419040902]
We propose MixPoet, a novel model that absorbs multiple factors to create various styles and promote diversity.
Based on a semi-supervised variational autoencoder, our model disentangles the latent space into some subspaces, with each conditioned on one influence factor by adversarial training.
Experiment results on Chinese poetry demonstrate that MixPoet improves both diversity and quality against three state-of-the-art models.
arXiv Detail & Related papers (2020-03-13T03:31:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.