Bridging Music and Text with Crowdsourced Music Comments: A
Sequence-to-Sequence Framework for Thematic Music Comments Generation
- URL: http://arxiv.org/abs/2209.01996v1
- Date: Mon, 5 Sep 2022 14:51:51 GMT
- Title: Bridging Music and Text with Crowdsourced Music Comments: A
Sequence-to-Sequence Framework for Thematic Music Comments Generation
- Authors: Peining Zhang, Junliang Guo, Linli Xu, Mu You, Junming Yin
- Abstract summary: We exploit the crowd-sourced music comments to construct a new dataset and propose a sequence-to-sequence model to generate text descriptions of music.
To enhance the authenticity and thematicity of generated texts, we propose a discriminator and a novel topic evaluator.
- Score: 18.2750732408488
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider a novel task of automatically generating text descriptions of
music. Compared with other well-established text generation tasks such as image
caption, the scarcity of well-paired music and text datasets makes it a much
more challenging task. In this paper, we exploit the crowd-sourced music
comments to construct a new dataset and propose a sequence-to-sequence model to
generate text descriptions of music. More concretely, we use the dilated
convolutional layer as the basic component of the encoder and a memory based
recurrent neural network as the decoder. To enhance the authenticity and
thematicity of generated texts, we further propose to fine-tune the model with
a discriminator as well as a novel topic evaluator. To measure the quality of
generated texts, we also propose two new evaluation metrics, which are more
aligned with human evaluation than traditional metrics such as BLEU.
Experimental results verify that our model is capable of generating fluent and
meaningful comments while containing thematic and content information of the
original music.
Related papers
- Text Conditioned Symbolic Drumbeat Generation using Latent Diffusion Models [0.0]
This study introduces a text-conditioned approach to generating drumbeats with Latent Diffusion Models (LDMs)
By pretraining a text and drumbeat encoder through contrastive learning within a multimodal network, we align the modalities of text and music closely.
We show that the generated drumbeats are novel and apt to the prompt text, and comparable in quality to those created by human musicians.
arXiv Detail & Related papers (2024-08-05T13:23:05Z) - MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation [19.878013881045817]
MusiConGen is a temporally-conditioned Transformer-based text-to-music model.
It integrates automatically-extracted rhythm and chords as the condition signal.
We show that MusiConGen can generate realistic backing track music that aligns well with the specified conditions.
arXiv Detail & Related papers (2024-07-21T05:27:53Z) - T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation [55.57459883629706]
We conduct the first systematic study on compositional text-to-video generation.
We propose T2V-CompBench, the first benchmark tailored for compositional text-to-video generation.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - Detecting Synthetic Lyrics with Few-Shot Inference [5.448536338411993]
We have curated the first dataset of high-quality synthetic lyrics.
Our best few-shot detector, based on LLM2Vec, surpasses stylistic and statistical methods.
This study emphasizes the need for further research on creative content detection.
arXiv Detail & Related papers (2024-06-21T15:19:21Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Unsupervised Melody-to-Lyric Generation [91.29447272400826]
We propose a method for generating high-quality lyrics without training on any aligned melody-lyric data.
We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints.
Our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines.
arXiv Detail & Related papers (2023-05-30T17:20:25Z) - Music-to-Text Synaesthesia: Generating Descriptive Text from Music
Recordings [36.090928638883454]
Music-to-text synaesthesia aims to generate descriptive texts from music recordings with the same sentiment for further understanding.
We build a computational model to generate sentences that can describe the content of the music recording.
To tackle the highly non-discriminative classical music, we design a group topology-preservation loss.
arXiv Detail & Related papers (2022-10-02T06:06:55Z) - Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN)
We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z) - SongNet: Rigid Formats Controlled Text Generation [51.428634666559724]
We propose a simple and elegant framework named SongNet to tackle this problem.
The backbone of the framework is a Transformer-based auto-regressive language model.
A pre-training and fine-tuning framework is designed to further improve the generation quality.
arXiv Detail & Related papers (2020-04-17T01:40:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.