Related papers: GPT Czech Poet: Generation of Czech Poetic Strophes with Language Models

GPT Czech Poet: Generation of Czech Poetic Strophes with Language Models

URL: http://arxiv.org/abs/2407.12790v1
Date: Tue, 18 Jun 2024 06:19:45 GMT
Title: GPT Czech Poet: Generation of Czech Poetic Strophes with Language Models
Authors: Michal Chudoba, Rudolf Rosa,
Abstract summary: We introduce a new model for generating poetry in Czech language, based on fine-tuning a pre-trained Large Language Model. We demonstrate that guiding the generation process by explicitly specifying strophe parameters within the poem text strongly improves the effectiveness of the model.
Score: 0.4444634303550442
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: High-quality automated poetry generation systems are currently only available for a small subset of languages. We introduce a new model for generating poetry in Czech language, based on fine-tuning a pre-trained Large Language Model. We demonstrate that guiding the generation process by explicitly specifying strophe parameters within the poem text strongly improves the effectiveness of the model. We also find that appropriate tokenization is crucial, showing that tokenization methods based on syllables or individual characters instead of subwords prove superior in generating poetic strophes. We further enhance the results by introducing \textit{Forced~generation}, adding explicit specifications of meter and verse parameters at inference time based on the already generated text. We evaluate a range of setups, showing that our proposed approach achieves high accuracies in rhyming and metric aspects of formal quality of the generated poems.

Related papers

From Plain Text to Poetic Form: Generating Metrically-Constrained Sanskrit Verses [22.08984009109879]
We introduce a dataset designed for translating English prose into structured Sanskrit verse.<n>We explore constrained decoding strategies and instruction-based fine-tuning tailored to metrical and semantic fidelity.
arXiv Detail & Related papers (2025-06-01T03:35:46Z)
Automated Evaluation of Meter and Rhyme in Russian Generative and Human-Authored Poetry [0.0]
We introduce the Russian Poetry Scansion Tool library for stress mark placement in Russian-language poetry. We release RIFMA -- a dataset of poem fragments spanning various genres and forms, annotated with stress marks.
arXiv Detail & Related papers (2025-02-28T10:39:07Z)
Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation. We introduce novel methodologies and datasets to overcome these challenges. We propose MhBART, an encoder-decoder model designed to emulate human writing style. We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z)
Word-wise intonation model for cross-language TTS systems [0.0]
The proposed model is suitable for automatic data markup and its extended application to text-to-speech systems. The key idea is a partial elimination of the variability connected with different placements of a stressed syllable in a word. The proposed model could be used as a tool for intonation research or as a backbone for prosody description in text-to-speech systems.
arXiv Detail & Related papers (2024-09-30T15:09:42Z)
Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents. Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z)
PoetryDiffusion: Towards Joint Semantic and Metrical Manipulation in Poetry Generation [58.36105306993046]
Controllable text generation is a challenging and meaningful field in natural language generation (NLG) In this paper, we pioneer the use of the Diffusion model for generating sonnets and Chinese SongCi poetry. Our model outperforms existing models in automatic evaluation of semantic, metrical, and overall performance as well as human evaluation.
arXiv Detail & Related papers (2023-06-14T11:57:31Z)
ByGPT5: End-to-End Style-conditioned Poetry Generation with Token-free Language Models [23.381986209234157]
In this work, we investigate end-to-end poetry generation conditioned on styles such as rhyme, meter, and alliteration. We successfully pre-train ByGPT5, a new token-free decoder-only language model, and fine-tune it on a large custom corpus of English and German quatrains annotated with our styles. We show that ByGPT5 outperforms other models such as mT5, ByT5, GPT-2 and ChatGPT, while also being more parameter efficient and performing favorably compared to humans.
arXiv Detail & Related papers (2022-12-20T17:49:49Z)
Classifiers are Better Experts for Controllable Text Generation [63.17266060165098]
We show that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and sentiment accuracy based on the external classifier of generated texts. The same time, it is also easier to implement and tune, and has significantly fewer restrictions and requirements.
arXiv Detail & Related papers (2022-05-15T12:58:35Z)
Improving Text Auto-Completion with Next Phrase Prediction [9.385387026783103]
Our strategy includes a novel self-supervised training objective called Next Phrase Prediction (NPP) Preliminary experiments have shown that our approach is able to outperform the baselines in auto-completion for email and academic writing domains.
arXiv Detail & Related papers (2021-09-15T04:26:15Z)
Progressive Generation of Long Text with Pretrained Language Models [83.62523163717448]
Large-scale language models (LMs) pretrained on massive corpora of text, such as GPT-2, are powerful open-domain text generators. It is still challenging for such models to generate coherent long passages of text, especially when the models are fine-tuned to the target domain on a small corpus. We propose a simple but effective method of generating text in a progressive manner, inspired by generating images from low to high resolution.
arXiv Detail & Related papers (2020-06-28T21:23:05Z)
Improving Adversarial Text Generation by Modeling the Distant Future [155.83051741029732]
We consider a text planning scheme and present a model-based imitation-learning approach to alleviate the aforementioned issues. We propose a novel guider network to focus on the generative process over a longer horizon, which can assist next-word prediction and provide intermediate rewards for generator optimization.
arXiv Detail & Related papers (2020-05-04T05:45:13Z)
SongNet: Rigid Formats Controlled Text Generation [51.428634666559724]
We propose a simple and elegant framework named SongNet to tackle this problem. The backbone of the framework is a Transformer-based auto-regressive language model. A pre-training and fine-tuning framework is designed to further improve the generation quality.
arXiv Detail & Related papers (2020-04-17T01:40:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.