ByGPT5: End-to-End Style-conditioned Poetry Generation with Token-free
Language Models
- URL: http://arxiv.org/abs/2212.10474v2
- Date: Mon, 22 May 2023 21:15:06 GMT
- Title: ByGPT5: End-to-End Style-conditioned Poetry Generation with Token-free
Language Models
- Authors: Jonas Belouadi, Steffen Eger
- Abstract summary: In this work, we investigate end-to-end poetry generation conditioned on styles such as rhyme, meter, and alliteration.
We successfully pre-train ByGPT5, a new token-free decoder-only language model, and fine-tune it on a large custom corpus of English and German quatrains annotated with our styles.
We show that ByGPT5 outperforms other models such as mT5, ByT5, GPT-2 and ChatGPT, while also being more parameter efficient and performing favorably compared to humans.
- Score: 23.381986209234157
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art poetry generation systems are often complex. They either
consist of task-specific model pipelines, incorporate prior knowledge in the
form of manually created constraints, or both. In contrast, end-to-end models
would not suffer from the overhead of having to model prior knowledge and could
learn the nuances of poetry from data alone, reducing the degree of human
supervision required. In this work, we investigate end-to-end poetry generation
conditioned on styles such as rhyme, meter, and alliteration. We identify and
address lack of training data and mismatching tokenization algorithms as
possible limitations of past attempts. In particular, we successfully pre-train
ByGPT5, a new token-free decoder-only language model, and fine-tune it on a
large custom corpus of English and German quatrains annotated with our styles.
We show that ByGPT5 outperforms other models such as mT5, ByT5, GPT-2 and
ChatGPT, while also being more parameter efficient and performing favorably
compared to humans. In addition, we analyze its runtime performance and
demonstrate that it is not prone to memorization. We make our code, models, and
datasets publicly available.
Related papers
- Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets [3.0040661953201475]
Large language models (LLMs) can now generate and recognize poetry.
We develop a task to evaluate how well LLMs recognize one aspect of English-language poetry.
We show that state-of-the-art LLMs can successfully identify both common and uncommon fixed poetic forms.
arXiv Detail & Related papers (2024-06-27T05:36:53Z) - GPT Czech Poet: Generation of Czech Poetic Strophes with Language Models [0.4444634303550442]
We introduce a new model for generating poetry in Czech language, based on fine-tuning a pre-trained Large Language Model.
We demonstrate that guiding the generation process by explicitly specifying strophe parameters within the poem text strongly improves the effectiveness of the model.
arXiv Detail & Related papers (2024-06-18T06:19:45Z) - FOCUS: Forging Originality through Contrastive Use in Self-Plagiarism for Language Models [38.76912842622624]
Pre-trained Language Models (PLMs) have shown impressive results in various Natural Language Generation (NLG) tasks.
This study introduces a unique "self-plagiarism" contrastive decoding strategy, aimed at boosting the originality of text produced by PLMs.
arXiv Detail & Related papers (2024-06-02T19:17:00Z) - PoetryDiffusion: Towards Joint Semantic and Metrical Manipulation in
Poetry Generation [58.36105306993046]
Controllable text generation is a challenging and meaningful field in natural language generation (NLG)
In this paper, we pioneer the use of the Diffusion model for generating sonnets and Chinese SongCi poetry.
Our model outperforms existing models in automatic evaluation of semantic, metrical, and overall performance as well as human evaluation.
arXiv Detail & Related papers (2023-06-14T11:57:31Z) - Fine-Tashkeel: Finetuning Byte-Level Models for Accurate Arabic Text
Diacritization [10.342180619706724]
We finetune token-free pre-trained multilingual models to learn to predict and insert missing diacritics in Arabic text.
We show that we can achieve state-of-the-art on the diacritization task with minimal amount of training and no feature engineering.
arXiv Detail & Related papers (2023-03-25T23:41:33Z) - T5lephone: Bridging Speech and Text Self-supervised Models for Spoken
Language Understanding via Phoneme level T5 [65.32642587901903]
We conduct extensive studies on how PLMs with different tokenization strategies affect spoken language understanding task.
We extend the idea to create T5lephone, a variant of T5 that is pretrained using phonemicized text.
arXiv Detail & Related papers (2022-11-01T17:00:23Z) - Quark: Controllable Text Generation with Reinforced Unlearning [68.07749519374089]
Large-scale language models often learn behaviors that are misaligned with user expectations.
We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property.
For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods.
arXiv Detail & Related papers (2022-05-26T21:11:51Z) - Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models.
Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z) - ByT5: Towards a token-free future with pre-trained byte-to-byte models [23.532359202069063]
Most widely-used pre-trained language models operate on sequences of tokens corresponding to word or subword units.
We show that a standard Transformer architecture can be used with minimal modifications to process byte sequences.
We also demonstrate that byte-level models are significantly more robust to noise and perform better on tasks that are sensitive to spelling and pronunciation.
arXiv Detail & Related papers (2021-05-28T07:03:22Z) - Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese
Pre-trained Language Models [62.41139712595334]
We propose a novel pre-training paradigm for Chinese -- Lattice-BERT.
We construct a lattice graph from the characters and words in a sentence and feed all these text units into transformers.
We show that our model can bring an average increase of 1.5% under the 12-layer setting.
arXiv Detail & Related papers (2021-04-15T02:36:49Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.