Language models for longitudinal analysis of abusive content in Billboard Music Charts
- URL: http://arxiv.org/abs/2510.06266v1
- Date: Mon, 06 Oct 2025 01:59:21 GMT
- Title: Language models for longitudinal analysis of abusive content in Billboard Music Charts
- Authors: Rohitash Chandra, Yathin Suresh, Divyansh Raj Sinha, Sanchit Jindal,
- Abstract summary: We analyse songs (lyrics) from Billboard Charts of the United States in the last seven decades.<n>Results show a significant rise in explicit content in popular music from 1990 onwards.<n>An increasing prevalence of songs with lyrics containing profane, sexually explicit, and otherwise inappropriate language.
- Score: 3.2654923574107357
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: There is no doubt that there has been a drastic increase in abusive and sexually explicit content in music, particularly in Billboard Music Charts. However, there is a lack of studies that validate the trend for effective policy development, as such content has harmful behavioural changes in children and youths. In this study, we utilise deep learning methods to analyse songs (lyrics) from Billboard Charts of the United States in the last seven decades. We provide a longitudinal study using deep learning and language models and review the evolution of content using sentiment analysis and abuse detection, including sexually explicit content. Our results show a significant rise in explicit content in popular music from 1990 onwards. Furthermore, we find an increasing prevalence of songs with lyrics containing profane, sexually explicit, and otherwise inappropriate language. The longitudinal analysis of the ability of language models to capture nuanced patterns in lyrical content, reflecting shifts in societal norms and language use over time.
Related papers
- Fine-Tuning Large Language Models for Automatic Detection of Sexually Explicit Content in Spanish-Language Song Lyrics [1.3320917259299652]
This paper presents an approach to the automatic detection of sexually explicit content in Spanish-language song lyrics.<n>A Generative Pre-trained Transformer model is fine-tuned to adapt to the idiosyncratic linguistic features of urban Latin music.<n>The paper develops a public policy proposal for a multi-tier age-based content rating system for music.
arXiv Detail & Related papers (2026-02-05T09:45:09Z) - AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking [59.15472057710525]
AVMeme Exam is a human-curated benchmark of over one thousand iconic Internet sounds and videos spanning speech, songs, music, and sound effects.<n>Each meme is paired with a unique Q&A assessing levels of understanding from surface content to context and emotion to usage and world knowledge.<n>We systematically evaluate state-of-the-art multimodal large language models (MLLMs) alongside human participants using this benchmark.
arXiv Detail & Related papers (2026-01-25T01:40:15Z) - Abusive music and song transformation using GenAI and LLMs [3.8271803328378677]
This study explores the use of generative artificial intelligence (GenAI) and Large Language Models (LLMs) to automatically transform abusive words (vocal delivery) and lyrical content in popular music.<n>We present a comparative analysis of four selected English songs and their transformed counterparts, evaluating changes through both acoustic and sentiment-based lenses.<n>Our findings indicate that Gen-AI significantly reduces vocal aggressiveness, with acoustic analysis showing improvements in Harmonic to Noise Ratio, Cepstral Peak Prominence, and Shimmer.
arXiv Detail & Related papers (2026-01-21T02:56:45Z) - CURVE: A Benchmark for Cultural and Multilingual Long Video Reasoning [58.73855961335903]
CURVE (Cultural Understanding and Reasoning in Video Evaluation) is a challenging benchmark for multicultural and multilingual video reasoning.<n>It comprises high-quality, entirely human-generated annotations from diverse, region-specific cultural videos across 18 global locales.<n>Our evaluations reveal that SoTA Video-LLMs struggle significantly, performing substantially below human-level accuracy.
arXiv Detail & Related papers (2026-01-15T18:15:06Z) - SongSage: A Large Musical Language Model with Lyric Generative Pre-training [69.52790104805794]
SongSage is a large musical language model equipped with diverse lyric-centric intelligence through lyric generative pretraining.<n>SongSage exhibits a strong understanding of lyric-centric knowledge, excels in rewriting user queries for zero-shot playlist recommendations, generates and continues lyrics effectively, and performs proficiently across seven additional capabilities.
arXiv Detail & Related papers (2026-01-03T10:54:37Z) - Music Flamingo: Scaling Music Understanding in Audio Language Models [98.94537017112704]
Music Flamingo is a novel large audio-language model designed to advance music understanding in foundational audio models.<n> MF-Skills is a dataset labeled through a multi-stage pipeline that yields rich captions and question-answer pairs covering harmony, structure, timbre, lyrics, and cultural context.<n>We introduce a post-training recipe: we first cold-start with MF-Think, a novel chain-of-thought dataset grounded in music theory, followed by GRPO-based reinforcement learning with custom rewards.
arXiv Detail & Related papers (2025-11-13T13:21:09Z) - Disc-Cover Complexity Trends in Music Illustrations from Sinatra to Swift [51.70874799858211]
We examine the visual complexity of album covers spanning 75 years and 11 popular musical genres.<n>Our analysis reveals a broad shift toward minimalism across most genres, with notable exceptions.<n>At the same time, we observe growing variance over time, with many covers continuing to display high levels of abstraction and intricacy.
arXiv Detail & Related papers (2025-10-01T15:01:25Z) - Longitudinal Abuse and Sentiment Analysis of Hollywood Movie Dialogues using Language Models [3.503370263836711]
We use language models to explore the longitudinal abuse and sentiment analysis of Hollywood Oscar and blockbuster movie dialogues from 1950 to 2024.<n>We employ fine-tuned language models to examine the trends and shifts in emotional and abusive content over the past seven decades.<n>Findings reveal significant temporal changes in movie dialogues, which reflect broader social and cultural influences.
arXiv Detail & Related papers (2025-01-20T00:44:38Z) - Synthetic Lyrics Detection Across Languages and Genres [4.987546582439803]
Large language models (LLMs) to generate music content, particularly lyrics, has gained in popularity.<n>Previous research has explored content detection in various domains, but no work has focused on the text modality, lyrics, in music.<n>We curated a diverse dataset of real and synthetic lyrics from multiple languages, music genres, and artists.<n>We performed a thorough evaluation of existing synthetic text detection approaches on lyrics, a previously unexplored data type.<n>Following both music and industrial constraints, we examined how well these approaches generalize across languages, scale with data availability, handle multilingual language content, and perform on novel genres in few-shot settings
arXiv Detail & Related papers (2024-06-21T15:19:21Z) - Unsupervised Melody-Guided Lyrics Generation [84.22469652275714]
We propose to generate pleasantly listenable lyrics without training on melody-lyric aligned data.
We leverage the crucial alignments between melody and lyrics and compile the given melody into constraints to guide the generation process.
arXiv Detail & Related papers (2023-05-12T20:57:20Z) - ReDDIT: Regret Detection and Domain Identification from Text [62.997667081978825]
We present a novel dataset of Reddit texts that have been classified into three classes: Regret by Action, Regret by Inaction, and No Regret.
Our findings show that Reddit users are most likely to express regret for past actions, particularly in the domain of relationships.
arXiv Detail & Related papers (2022-12-14T23:41:57Z) - Large scale analysis of gender bias and sexism in song lyrics [3.437656066916039]
We identify sexist lyrics at a larger scale than previous studies using small samples of manually annotated popular songs.
We find sexist content to increase across time, especially from male artists and for popular songs appearing in Billboard charts.
This is the first large scale analysis of this type, giving insights into language usage in such an influential part of popular culture.
arXiv Detail & Related papers (2022-08-03T13:18:42Z) - VidLanKD: Improving Language Understanding via Video-Distilled Knowledge
Transfer [76.3906723777229]
We present VidLanKD, a video-language knowledge distillation method for improving language understanding.
We train a multi-modal teacher model on a video-text dataset, and then transfer its knowledge to a student language model with a text dataset.
In our experiments, VidLanKD achieves consistent improvements over text-only language models and vokenization models.
arXiv Detail & Related papers (2021-07-06T15:41:32Z) - What's in the Box? An Analysis of Undesirable Content in the Common
Crawl Corpus [77.34726150561087]
We analyze the Common Crawl, a colossal web corpus extensively used for training language models.
We find that it contains a significant amount of undesirable content, including hate speech and sexually explicit content, even after filtering procedures.
arXiv Detail & Related papers (2021-05-06T14:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.