Related papers: Detecting Synthetic Lyrics with Few-Shot Inference

Detecting Synthetic Lyrics with Few-Shot Inference

URL: http://arxiv.org/abs/2406.15231v1
Date: Fri, 21 Jun 2024 15:19:21 GMT
Title: Detecting Synthetic Lyrics with Few-Shot Inference
Authors: Yanis Labrak, Gabriel Meseguer-Brocal, Elena V. Epure,
Abstract summary: We have curated the first dataset of high-quality synthetic lyrics. Our best few-shot detector, based on LLM2Vec, surpasses stylistic and statistical methods. This study emphasizes the need for further research on creative content detection.
Score: 5.448536338411993
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In recent years, generated content in music has gained significant popularity, with large language models being effectively utilized to produce human-like lyrics in various styles, themes, and linguistic structures. This technological advancement supports artists in their creative processes but also raises issues of authorship infringement, consumer satisfaction and content spamming. To address these challenges, methods for detecting generated lyrics are necessary. However, existing works have not yet focused on this specific modality or on creative text in general regarding machine-generated content detection methods and datasets. In response, we have curated the first dataset of high-quality synthetic lyrics and conducted a comprehensive quantitative evaluation of various few-shot content detection approaches, testing their generalization capabilities and complementing this with a human evaluation. Our best few-shot detector, based on LLM2Vec, surpasses stylistic and statistical methods, which are shown competitive in other domains at distinguishing human-written from machine-generated content. It also shows good generalization capabilities to new artists and models, and effectively detects post-generation paraphrasing. This study emphasizes the need for further research on creative content detection, particularly in terms of generalization and scalability with larger song catalogs. All datasets, pre-processing scripts, and code are available publicly on GitHub and Hugging Face under the Apache 2.0 license.

Related papers

SongSage: A Large Musical Language Model with Lyric Generative Pre-training [69.52790104805794]
SongSage is a large musical language model equipped with diverse lyric-centric intelligence through lyric generative pretraining.<n>SongSage exhibits a strong understanding of lyric-centric knowledge, excels in rewriting user queries for zero-shot playlist recommendations, generates and continues lyrics effectively, and performs proficiently across seven additional capabilities.
arXiv Detail & Related papers (2026-01-03T10:54:37Z)
Lyrics Matter: Exploiting the Power of Learnt Representations for Music Popularity Prediction [47.3124073459729]
This work addresses the under-explored role of lyrics in predicting popularity.<n>We present an automated pipeline that uses LLM to extract high-dimensional lyric embeddings.<n>These features are integrated into HitMusicLyricNet, a multimodal architecture that combines audio, lyrics, and social metadata for popularity score prediction.
arXiv Detail & Related papers (2025-12-05T08:09:26Z)
AI-Generated Song Detection via Lyrics Transcripts [15.1799390517192]
Recent rise in capabilities of AI-based music generation tools has created an upheaval in the music industry.<n>We propose solving this gap by transcribing songs using general automatic speech recognition (ASR) models.<n>Our method is more robust than state-of-the-art audio-based ones when the audio is perturbed in different ways.
arXiv Detail & Related papers (2025-06-23T10:42:50Z)
Double Entendre: Robust Audio-Based AI-Generated Lyrics Detection via Multi-View Fusion [11.060929679400667]
We propose a multimodal, modular late-fusion pipeline that combines automatically transcribed lyrics and speech features capturing lyrics-related information within the audio.<n>Our method, DE-detect, outperforms existing lyrics-based detectors while also being more robust to audio perturbations.
arXiv Detail & Related papers (2025-06-19T02:56:49Z)
Multi-label Cross-lingual automatic music genre classification from lyrics with Sentence BERT [0.13654846342364302]
We present a multi-label, cross-lingual genre classification system based on multilingual sentence embeddings generated by sBERT. Using a bilingual Portuguese-English dataset with eight overlapping genres, we demonstrate the system's ability to train on lyrics in one language and predict genres in another.
arXiv Detail & Related papers (2025-01-07T13:22:35Z)
Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval [7.7464988473650935]
Text-to-Music Retrieval plays a pivotal role in content discovery within extensive music databases. This paper proposes an improved Text-to-Music Retrieval model, denoted as TTMR++.
arXiv Detail & Related papers (2024-10-04T09:33:34Z)
SongCreator: Lyrics-based Universal Song Generation [53.248473603201916]
SongCreator is a song-generation system designed to tackle the challenge of generating songs with both vocals and accompaniment given lyrics. The model features two novel designs: a meticulously designed dual-sequence language model (M) to capture the information of vocals and accompaniment for song generation, and a series of attention mask strategies for DSLM. Experiments demonstrate the effectiveness of SongCreator by achieving state-of-the-art or competitive performances on all eight tasks.
arXiv Detail & Related papers (2024-09-09T19:37:07Z)
LyCon: Lyrics Reconstruction from the Bag-of-Words Using Large Language Models [1.1510009152620668]
Our study introduces a novel method for generating copyright-free lyrics from publicly available Bag-of-Words datasets. We have compiled and made available a dataset of reconstructed lyrics, LyCon, aligned with metadata from renowned sources. We believe that the integration of metadata such as mood annotations or genres enables a variety of academic experiments on lyrics.
arXiv Detail & Related papers (2024-08-27T03:01:48Z)
Multi-task Prompt Words Learning for Social Media Content Generation [8.209163857435273]
We propose a new prompt word generation framework based on multi-modal information fusion. We use a template containing a set of prompt words to guide ChatGPT to generate high-quality tweets. In the absence of effective and objective evaluation criteria in the field of content generation, we use the ChatGPT tool to evaluate the results generated by the algorithm.
arXiv Detail & Related papers (2024-07-10T15:46:32Z)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations. We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z)
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT [48.28624219567131]
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method. We use Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model. Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English.
arXiv Detail & Related papers (2023-06-29T17:01:51Z)
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions. We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation. Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z)
Bridging Music and Text with Crowdsourced Music Comments: A Sequence-to-Sequence Framework for Thematic Music Comments Generation [18.2750732408488]
We exploit the crowd-sourced music comments to construct a new dataset and propose a sequence-to-sequence model to generate text descriptions of music. To enhance the authenticity and thematicity of generated texts, we propose a discriminator and a novel topic evaluator.
arXiv Detail & Related papers (2022-09-05T14:51:51Z)
Contrastive Audio-Language Learning for Music [13.699088044513562]
MusCALL is a framework for Music Contrastive Audio-Language Learning. Our approach consists of a dual-encoder architecture that learns the alignment between pairs of music audio and descriptive sentences.
arXiv Detail & Related papers (2022-08-25T16:55:15Z)
Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music [73.73045854068384]
We propose to transcribe the lyrics of polyphonic music using a novel genre-conditioned network. The proposed network adopts pre-trained model parameters, and incorporates the genre adapters between layers to capture different genre peculiarities for lyrics-genre pairs. Our experiments show that the proposed genre-conditioned network outperforms the existing lyrics transcription systems.
arXiv Detail & Related papers (2022-04-07T09:15:46Z)
Youling: an AI-Assisted Lyrics Creation System [72.00418962906083]
This paper demonstrates textitYouling, an AI-assisted lyrics creation system, designed to collaborate with music creators. In the lyrics generation process, textitYouling supports traditional one pass full-text generation mode as well as an interactive generation mode. The system also provides a revision module which enables users to revise undesired sentences or words of lyrics repeatedly.
arXiv Detail & Related papers (2022-01-18T03:57:04Z)
Sentiment analysis in tweets: an assessment study from classical to modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information. Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks. This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z)
A General Framework for Learning Prosodic-Enhanced Representation of Rap Lyrics [21.944835086749375]
Learning and analyzing rap lyrics is a significant basis for many web applications. We propose a hierarchical attention variational autoencoder framework (HAVAE) A feature aggregation strategy is proposed to appropriately integrate various features and generate prosodic-enhanced representation.
arXiv Detail & Related papers (2021-03-23T15:13:21Z)
A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions [10.179835761549471]
This paper attempts to provide an overview of various composition tasks under different music generation levels using deep learning. In addition, we summarize datasets suitable for diverse tasks, discuss the music representations, the evaluation methods as well as the challenges under different levels, and finally point out several future directions.
arXiv Detail & Related papers (2020-11-13T08:01:20Z)
Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN) We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z)
A Survey of Knowledge-Enhanced Text Generation [81.24633231919137]
The goal of text generation is to make machines express in human language. Various neural encoder-decoder models have been proposed to achieve the goal by learning to map input text to output text. To address this issue, researchers have considered incorporating various forms of knowledge beyond the input text into the generation models.
arXiv Detail & Related papers (2020-10-09T06:46:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.