Related papers: Fine-Tuning Large Language Models for Automatic Detection of Sexually Explicit Content in Spanish-Language Song Lyrics

Fine-Tuning Large Language Models for Automatic Detection of Sexually Explicit Content in Spanish-Language Song Lyrics

URL: http://arxiv.org/abs/2602.05485v1
Date: Thu, 05 Feb 2026 09:45:09 GMT
Title: Fine-Tuning Large Language Models for Automatic Detection of Sexually Explicit Content in Spanish-Language Song Lyrics
Authors: Dolores Zamacola Sánchez de Lamadrid, Eduardo C. Garrido-Merchán,
Abstract summary: This paper presents an approach to the automatic detection of sexually explicit content in Spanish-language song lyrics.<n>A Generative Pre-trained Transformer model is fine-tuned to adapt to the idiosyncratic linguistic features of urban Latin music.<n>The paper develops a public policy proposal for a multi-tier age-based content rating system for music.
Score: 1.3320917259299652
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The proliferation of sexually explicit content in popular music genres such as reggaeton and trap, consumed predominantly by young audiences, has raised significant societal concern regarding the exposure of minors to potentially harmful lyrical material. This paper presents an approach to the automatic detection of sexually explicit content in Spanish-language song lyrics by fine-tuning a Generative Pre-trained Transformer (GPT) model on a curated corpus of 100 songs, evenly divided between expert-labeled explicit and non-explicit categories. The proposed methodology leverages transfer learning to adapt the pre-trained model to the idiosyncratic linguistic features of urban Latin music, including slang, metaphors, and culturally specific double entendres that evade conventional dictionary-based filtering systems. Experimental evaluation on held-out test sets demonstrates that the fine-tuned model achieves 87% accuracy, 100% precision, and 100% specificity after a feedback-driven refinement loop, outperforming both its pre-feedback configuration and a non-customized baseline ChatGPT model. A comparative analysis reveals that the fine-tuned model agrees with expert human classification in 59.2% of cases versus 55.1% for the standard model, confirming that domain-specific adaptation enhances sensitivity to implicit and culturally embedded sexual references. These findings support the viability of deploying fine-tuned large language models as automated content moderation tools on music streaming platforms. Building on these technical results, the paper develops a public policy proposal for a multi-tier age-based content rating system for music analogous to the PEGI system for video games analyzed through the PESTEL framework and Kingdon's Multiple Streams Framework, establishing both the technological feasibility and the policy pathway for systematic music content regulation.

Related papers

Music Flamingo: Scaling Music Understanding in Audio Language Models [98.94537017112704]
Music Flamingo is a novel large audio-language model designed to advance music understanding in foundational audio models.<n> MF-Skills is a dataset labeled through a multi-stage pipeline that yields rich captions and question-answer pairs covering harmony, structure, timbre, lyrics, and cultural context.<n>We introduce a post-training recipe: we first cold-start with MF-Think, a novel chain-of-thought dataset grounded in music theory, followed by GRPO-based reinforcement learning with custom rewards.
arXiv Detail & Related papers (2025-11-13T13:21:09Z)
Language models for longitudinal analysis of abusive content in Billboard Music Charts [3.2654923574107357]
We analyse songs (lyrics) from Billboard Charts of the United States in the last seven decades.<n>Results show a significant rise in explicit content in popular music from 1990 onwards.<n>An increasing prevalence of songs with lyrics containing profane, sexually explicit, and otherwise inappropriate language.
arXiv Detail & Related papers (2025-10-06T01:59:21Z)
Evaluation of pretrained language models on music understanding [0.0]
We demonstrate that Large Language Models (LLM) suffer from 1) prompt sensitivity, 2) inability to model negation, and 3) sensitivity towards the presence of specific words. We quantified these properties as a triplet-based accuracy, evaluating the ability to model the relative similarity of labels in a hierarchical ontology. Despite the relatively high accuracy reported, inconsistencies are evident in all six models, suggesting that off-the-shelf LLMs need adaptation to music before use.
arXiv Detail & Related papers (2024-09-17T14:44:49Z)
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models [11.834712543531756]
MuChoMusic is a benchmark for evaluating music understanding in multimodal language models focused on audio. It comprises 1,187 multiple-choice questions, all validated by human annotators, on 644 music tracks sourced from two publicly available music datasets. We evaluate five open-source models and identify several pitfalls, including an over-reliance on the language modality.
arXiv Detail & Related papers (2024-08-02T15:34:05Z)
Automatic Detection of Moral Values in Music Lyrics [4.747987317906765]
Moral values play a fundamental role in how we evaluate information, make decisions, and form judgements around important social issues. We tasked a set of transformer-based language models (BERT) fine-tuned on 2,721 synthetic lyrics to detect moral values in 200 real music lyrics annotated by two experts. We evaluate their predictive capabilities against a series of baselines including out-of-domain (BERT fine-tuned on MFT-annotated social media texts) and zero-shot (GPT-4) classification. The proposed models yielded the best accuracy across experiments, with an average F1 weighted score of 0.8.
arXiv Detail & Related papers (2024-07-26T14:49:21Z)
Synthetic Lyrics Detection Across Languages and Genres [4.987546582439803]
Large language models (LLMs) to generate music content, particularly lyrics, has gained in popularity.<n>Previous research has explored content detection in various domains, but no work has focused on the text modality, lyrics, in music.<n>We curated a diverse dataset of real and synthetic lyrics from multiple languages, music genres, and artists.<n>We performed a thorough evaluation of existing synthetic text detection approaches on lyrics, a previously unexplored data type.<n>Following both music and industrial constraints, we examined how well these approaches generalize across languages, scale with data availability, handle multilingual language content, and perform on novel genres in few-shot settings
arXiv Detail & Related papers (2024-06-21T15:19:21Z)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations. We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z)
Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases [79.07111754406841]
This work proposes using contrastive evaluation to measure the ability of direct S2TT systems to disambiguate utterances where prosody plays a crucial role. Our results clearly demonstrate the value of direct translation systems over cascade translation models.
arXiv Detail & Related papers (2024-02-01T14:46:35Z)
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions. We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation. Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z)
Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis. We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z)
Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN) We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.