SongNet: Rigid Formats Controlled Text Generation
- URL: http://arxiv.org/abs/2004.08022v2
- Date: Sat, 17 Apr 2021 03:49:06 GMT
- Title: SongNet: Rigid Formats Controlled Text Generation
- Authors: Piji Li, Haisong Zhang, Xiaojiang Liu, Shuming Shi
- Abstract summary: We propose a simple and elegant framework named SongNet to tackle this problem.
The backbone of the framework is a Transformer-based auto-regressive language model.
A pre-training and fine-tuning framework is designed to further improve the generation quality.
- Score: 51.428634666559724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural text generation has made tremendous progress in various tasks. One
common characteristic of most of the tasks is that the texts are not restricted
to some rigid formats when generating. However, we may confront some special
text paradigms such as Lyrics (assume the music score is given), Sonnet, SongCi
(classical Chinese poetry of the Song dynasty), etc. The typical
characteristics of these texts are in three folds: (1) They must comply fully
with the rigid predefined formats. (2) They must obey some rhyming schemes. (3)
Although they are restricted to some formats, the sentence integrity must be
guaranteed. To the best of our knowledge, text generation based on the
predefined rigid formats has not been well investigated. Therefore, we propose
a simple and elegant framework named SongNet to tackle this problem. The
backbone of the framework is a Transformer-based auto-regressive language
model. Sets of symbols are tailor-designed to improve the modeling performance
especially on format, rhyme, and sentence integrity. We improve the attention
mechanism to impel the model to capture some future information on the format.
A pre-training and fine-tuning framework is designed to further improve the
generation quality. Extensive experiments conducted on two collected corpora
demonstrate that our proposed framework generates significantly better results
in terms of both automatic metrics and the human evaluation.
Related papers
- MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation [19.878013881045817]
MusiConGen is a temporally-conditioned Transformer-based text-to-music model.
It integrates automatically-extracted rhythm and chords as the condition signal.
We show that MusiConGen can generate realistic backing track music that aligns well with the specified conditions.
arXiv Detail & Related papers (2024-07-21T05:27:53Z) - Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Instruct-SCTG: Guiding Sequential Controlled Text Generation through
Instructions [42.67608830386934]
Instruct-SCTG is a sequential framework that harnesses instruction-tuned language models to generate structurally coherent text.
Our framework generates articles in a section-by-section manner, aligned with the desired human structure using natural language instructions.
arXiv Detail & Related papers (2023-12-19T16:20:49Z) - Unsupervised Melody-to-Lyric Generation [91.29447272400826]
We propose a method for generating high-quality lyrics without training on any aligned melody-lyric data.
We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints.
Our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines.
arXiv Detail & Related papers (2023-05-30T17:20:25Z) - GlyphDiffusion: Text Generation as Image Generation [100.98428068214736]
We propose GlyphDiffusion, a novel diffusion approach for text generation via text-guided image generation.
Our key idea is to render the target text as a glyph image containing visual language content.
Our model also makes significant improvements compared to the recent diffusion model.
arXiv Detail & Related papers (2023-04-25T02:14:44Z) - Noise2Music: Text-conditioned Music Generation with Diffusion Models [73.74580231353684]
We introduce Noise2Music, where a series of diffusion models is trained to generate high-quality 30-second music clips from text prompts.
We find that the generated audio is not only able to faithfully reflect key elements of the text prompt such as genre, tempo, instruments, mood, and era.
Pretrained large language models play a key role in this story -- they are used to generate paired text for the audio of the training set and to extract embeddings of the text prompts ingested by the diffusion models.
arXiv Detail & Related papers (2023-02-08T07:27:27Z) - Bridging Music and Text with Crowdsourced Music Comments: A
Sequence-to-Sequence Framework for Thematic Music Comments Generation [18.2750732408488]
We exploit the crowd-sourced music comments to construct a new dataset and propose a sequence-to-sequence model to generate text descriptions of music.
To enhance the authenticity and thematicity of generated texts, we propose a discriminator and a novel topic evaluator.
arXiv Detail & Related papers (2022-09-05T14:51:51Z) - Attend, Memorize and Generate: Towards Faithful Table-to-Text Generation
in Few Shots [58.404516361586325]
Few-shot table-to-text generation is a task of composing fluent and faithful sentences to convey table content using limited data.
This paper proposes a novel approach, Memorize and Generate (called AMG), inspired by the text generation process of humans.
arXiv Detail & Related papers (2022-03-01T20:37:20Z) - Outline to Story: Fine-grained Controllable Story Generation from
Cascaded Events [39.577220559911055]
We propose a new task named "Outline to Story" (O2S) as a test bed for fine-grained controllable generation of long text.
We then create datasets for future benchmarks, built by state-of-the-art keyword extraction techniques.
arXiv Detail & Related papers (2021-01-04T08:16:21Z) - Facts2Story: Controlling Text Generation by Key Facts [0.0]
We propose a controlled generation task based on expanding a sequence of facts, expressed in natural language, into a longer narrative.
We show that while auto-regressive, unidirectional Language Models such as GPT2 produce better fluency, they struggle to adhere to the requested facts.
We propose a plan-and-cloze model (using fine-tuned XLNet) which produces competitive fluency while adhering to the requested content.
arXiv Detail & Related papers (2020-12-08T10:14:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.