Related papers: Synthetic Lyrics Detection Across Languages and Genres

Synthetic Lyrics Detection Across Languages and Genres

URL: http://arxiv.org/abs/2406.15231v4
Date: Thu, 24 Apr 2025 07:21:44 GMT
Title: Synthetic Lyrics Detection Across Languages and Genres
Authors: Yanis Labrak, Markus Frohmann, Gabriel Meseguer-Brocal, Elena V. Epure,
Abstract summary: Large language models (LLMs) to generate music content, particularly lyrics, has gained in popularity.<n>Previous research has explored content detection in various domains, but no work has focused on the text modality, lyrics, in music.<n>We curated a diverse dataset of real and synthetic lyrics from multiple languages, music genres, and artists.<n>We performed a thorough evaluation of existing synthetic text detection approaches on lyrics, a previously unexplored data type.<n>Following both music and industrial constraints, we examined how well these approaches generalize across languages, scale with data availability, handle multilingual language content, and perform on novel genres in few-shot settings
Score: 4.987546582439803
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In recent years, the use of large language models (LLMs) to generate music content, particularly lyrics, has gained in popularity. These advances provide valuable tools for artists and enhance their creative processes, but they also raise concerns about copyright violations, consumer satisfaction, and content spamming. Previous research has explored content detection in various domains. However, no work has focused on the text modality, lyrics, in music. To address this gap, we curated a diverse dataset of real and synthetic lyrics from multiple languages, music genres, and artists. The generation pipeline was validated using both humans and automated methods. We performed a thorough evaluation of existing synthetic text detection approaches on lyrics, a previously unexplored data type. We also investigated methods to adapt the best-performing features to lyrics through unsupervised domain adaptation. Following both music and industrial constraints, we examined how well these approaches generalize across languages, scale with data availability, handle multilingual language content, and perform on novel genres in few-shot settings. Our findings show promising results that could inform policy decisions around AI-generated music and enhance transparency for users.

Related papers

AI-Generated Song Detection via Lyrics Transcripts [15.1799390517192]
Recent rise in capabilities of AI-based music generation tools has created an upheaval in the music industry.<n>We propose solving this gap by transcribing songs using general automatic speech recognition (ASR) models.<n>Our method is more robust than state-of-the-art audio-based ones when the audio is perturbed in different ways.
arXiv Detail & Related papers (2025-06-23T10:42:50Z)
Double Entendre: Robust Audio-Based AI-Generated Lyrics Detection via Multi-View Fusion [11.060929679400667]
We propose a multimodal, modular late-fusion pipeline that combines automatically transcribed lyrics and speech features capturing lyrics-related information within the audio.<n>Our method, DE-detect, outperforms existing lyrics-based detectors while also being more robust to audio perturbations.
arXiv Detail & Related papers (2025-06-19T02:56:49Z)
Multi-label Cross-lingual automatic music genre classification from lyrics with Sentence BERT [0.13654846342364302]
We present a multi-label, cross-lingual genre classification system based on multilingual sentence embeddings generated by sBERT. Using a bilingual Portuguese-English dataset with eight overlapping genres, we demonstrate the system's ability to train on lyrics in one language and predict genres in another.
arXiv Detail & Related papers (2025-01-07T13:22:35Z)
Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval [7.7464988473650935]
Text-to-Music Retrieval plays a pivotal role in content discovery within extensive music databases. This paper proposes an improved Text-to-Music Retrieval model, denoted as TTMR++.
arXiv Detail & Related papers (2024-10-04T09:33:34Z)
SongCreator: Lyrics-based Universal Song Generation [53.248473603201916]
SongCreator is a song-generation system designed to tackle the challenge of generating songs with both vocals and accompaniment given lyrics. The model features two novel designs: a meticulously designed dual-sequence language model (M) to capture the information of vocals and accompaniment for song generation, and a series of attention mask strategies for DSLM. Experiments demonstrate the effectiveness of SongCreator by achieving state-of-the-art or competitive performances on all eight tasks.
arXiv Detail & Related papers (2024-09-09T19:37:07Z)
LyCon: Lyrics Reconstruction from the Bag-of-Words Using Large Language Models [1.1510009152620668]
Our study introduces a novel method for generating copyright-free lyrics from publicly available Bag-of-Words datasets. We have compiled and made available a dataset of reconstructed lyrics, LyCon, aligned with metadata from renowned sources. We believe that the integration of metadata such as mood annotations or genres enables a variety of academic experiments on lyrics.
arXiv Detail & Related papers (2024-08-27T03:01:48Z)
Multi-task Prompt Words Learning for Social Media Content Generation [8.209163857435273]
We propose a new prompt word generation framework based on multi-modal information fusion. We use a template containing a set of prompt words to guide ChatGPT to generate high-quality tweets. In the absence of effective and objective evaluation criteria in the field of content generation, we use the ChatGPT tool to evaluate the results generated by the algorithm.
arXiv Detail & Related papers (2024-07-10T15:46:32Z)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations. We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z)
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT [48.28624219567131]
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method. We use Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model. Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English.
arXiv Detail & Related papers (2023-06-29T17:01:51Z)
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions. We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation. Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z)
Bridging Music and Text with Crowdsourced Music Comments: A Sequence-to-Sequence Framework for Thematic Music Comments Generation [18.2750732408488]
We exploit the crowd-sourced music comments to construct a new dataset and propose a sequence-to-sequence model to generate text descriptions of music. To enhance the authenticity and thematicity of generated texts, we propose a discriminator and a novel topic evaluator.
arXiv Detail & Related papers (2022-09-05T14:51:51Z)
Contrastive Audio-Language Learning for Music [13.699088044513562]
MusCALL is a framework for Music Contrastive Audio-Language Learning. Our approach consists of a dual-encoder architecture that learns the alignment between pairs of music audio and descriptive sentences.
arXiv Detail & Related papers (2022-08-25T16:55:15Z)
Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music [73.73045854068384]
We propose to transcribe the lyrics of polyphonic music using a novel genre-conditioned network. The proposed network adopts pre-trained model parameters, and incorporates the genre adapters between layers to capture different genre peculiarities for lyrics-genre pairs. Our experiments show that the proposed genre-conditioned network outperforms the existing lyrics transcription systems.
arXiv Detail & Related papers (2022-04-07T09:15:46Z)
Youling: an AI-Assisted Lyrics Creation System [72.00418962906083]
This paper demonstrates textitYouling, an AI-assisted lyrics creation system, designed to collaborate with music creators. In the lyrics generation process, textitYouling supports traditional one pass full-text generation mode as well as an interactive generation mode. The system also provides a revision module which enables users to revise undesired sentences or words of lyrics repeatedly.
arXiv Detail & Related papers (2022-01-18T03:57:04Z)
Sentiment analysis in tweets: an assessment study from classical to modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information. Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks. This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z)
A General Framework for Learning Prosodic-Enhanced Representation of Rap Lyrics [21.944835086749375]
Learning and analyzing rap lyrics is a significant basis for many web applications. We propose a hierarchical attention variational autoencoder framework (HAVAE) A feature aggregation strategy is proposed to appropriately integrate various features and generate prosodic-enhanced representation.
arXiv Detail & Related papers (2021-03-23T15:13:21Z)
A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions [10.179835761549471]
This paper attempts to provide an overview of various composition tasks under different music generation levels using deep learning. In addition, we summarize datasets suitable for diverse tasks, discuss the music representations, the evaluation methods as well as the challenges under different levels, and finally point out several future directions.
arXiv Detail & Related papers (2020-11-13T08:01:20Z)
Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN) We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z)
A Survey of Knowledge-Enhanced Text Generation [81.24633231919137]
The goal of text generation is to make machines express in human language. Various neural encoder-decoder models have been proposed to achieve the goal by learning to map input text to output text. To address this issue, researchers have considered incorporating various forms of knowledge beyond the input text into the generation models.
arXiv Detail & Related papers (2020-10-09T06:46:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.