Detecting Synthetic Lyrics with Few-Shot Inference
- URL: http://arxiv.org/abs/2406.15231v1
- Date: Fri, 21 Jun 2024 15:19:21 GMT
- Title: Detecting Synthetic Lyrics with Few-Shot Inference
- Authors: Yanis Labrak, Gabriel Meseguer-Brocal, Elena V. Epure,
- Abstract summary: We have curated the first dataset of high-quality synthetic lyrics.
Our best few-shot detector, based on LLM2Vec, surpasses stylistic and statistical methods.
This study emphasizes the need for further research on creative content detection.
- Score: 5.448536338411993
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In recent years, generated content in music has gained significant popularity, with large language models being effectively utilized to produce human-like lyrics in various styles, themes, and linguistic structures. This technological advancement supports artists in their creative processes but also raises issues of authorship infringement, consumer satisfaction and content spamming. To address these challenges, methods for detecting generated lyrics are necessary. However, existing works have not yet focused on this specific modality or on creative text in general regarding machine-generated content detection methods and datasets. In response, we have curated the first dataset of high-quality synthetic lyrics and conducted a comprehensive quantitative evaluation of various few-shot content detection approaches, testing their generalization capabilities and complementing this with a human evaluation. Our best few-shot detector, based on LLM2Vec, surpasses stylistic and statistical methods, which are shown competitive in other domains at distinguishing human-written from machine-generated content. It also shows good generalization capabilities to new artists and models, and effectively detects post-generation paraphrasing. This study emphasizes the need for further research on creative content detection, particularly in terms of generalization and scalability with larger song catalogs. All datasets, pre-processing scripts, and code are available publicly on GitHub and Hugging Face under the Apache 2.0 license.
Related papers
- SongCreator: Lyrics-based Universal Song Generation [53.248473603201916]
SongCreator is a song-generation system designed to tackle the challenge of generating songs with both vocals and accompaniment given lyrics.
The model features two novel designs: a meticulously designed dual-sequence language model (M) to capture the information of vocals and accompaniment for song generation, and a series of attention mask strategies for DSLM.
Experiments demonstrate the effectiveness of SongCreator by achieving state-of-the-art or competitive performances on all eight tasks.
arXiv Detail & Related papers (2024-09-09T19:37:07Z) - Multi-task Prompt Words Learning for Social Media Content Generation [8.209163857435273]
We propose a new prompt word generation framework based on multi-modal information fusion.
We use a template containing a set of prompt words to guide ChatGPT to generate high-quality tweets.
In the absence of effective and objective evaluation criteria in the field of content generation, we use the ChatGPT tool to evaluate the results generated by the algorithm.
arXiv Detail & Related papers (2024-07-10T15:46:32Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Bridging Music and Text with Crowdsourced Music Comments: A
Sequence-to-Sequence Framework for Thematic Music Comments Generation [18.2750732408488]
We exploit the crowd-sourced music comments to construct a new dataset and propose a sequence-to-sequence model to generate text descriptions of music.
To enhance the authenticity and thematicity of generated texts, we propose a discriminator and a novel topic evaluator.
arXiv Detail & Related papers (2022-09-05T14:51:51Z) - Youling: an AI-Assisted Lyrics Creation System [72.00418962906083]
This paper demonstrates textitYouling, an AI-assisted lyrics creation system, designed to collaborate with music creators.
In the lyrics generation process, textitYouling supports traditional one pass full-text generation mode as well as an interactive generation mode.
The system also provides a revision module which enables users to revise undesired sentences or words of lyrics repeatedly.
arXiv Detail & Related papers (2022-01-18T03:57:04Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - A General Framework for Learning Prosodic-Enhanced Representation of Rap
Lyrics [21.944835086749375]
Learning and analyzing rap lyrics is a significant basis for many web applications.
We propose a hierarchical attention variational autoencoder framework (HAVAE)
A feature aggregation strategy is proposed to appropriately integrate various features and generate prosodic-enhanced representation.
arXiv Detail & Related papers (2021-03-23T15:13:21Z) - A Comprehensive Survey on Deep Music Generation: Multi-level
Representations, Algorithms, Evaluations, and Future Directions [10.179835761549471]
This paper attempts to provide an overview of various composition tasks under different music generation levels using deep learning.
In addition, we summarize datasets suitable for diverse tasks, discuss the music representations, the evaluation methods as well as the challenges under different levels, and finally point out several future directions.
arXiv Detail & Related papers (2020-11-13T08:01:20Z) - Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN)
We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z) - A Survey of Knowledge-Enhanced Text Generation [81.24633231919137]
The goal of text generation is to make machines express in human language.
Various neural encoder-decoder models have been proposed to achieve the goal by learning to map input text to output text.
To address this issue, researchers have considered incorporating various forms of knowledge beyond the input text into the generation models.
arXiv Detail & Related papers (2020-10-09T06:46:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.