Global Beats, Local Tongue: Studying Code Switching in K-pop Hits on Billboard Charts
- URL: http://arxiv.org/abs/2509.23197v1
- Date: Sat, 27 Sep 2025 09:05:28 GMT
- Title: Global Beats, Local Tongue: Studying Code Switching in K-pop Hits on Billboard Charts
- Authors: Aditya Narayan Sankaran, Reza Farahbakhsh, Noel Crespi,
- Abstract summary: This paper investigates the role of code-switching and English lyric usage in K-pop songs that achieve global chart success.<n>A dataset of K-pop songs that appeared on the Billboard Hot 100 and Global 200 charts from 2017 to 2025 was compiled.<n>It was found that English dominates the linguistic landscape of globally charting K-pop songs, with both male and female performers exhibiting high degrees of code-switching and English usage.
- Score: 3.5238606794194816
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code switching, particularly between Korean and English, has become a defining feature of modern K-pop, reflecting both aesthetic choices and global market strategies. This paper is a primary investigation into the linguistic strategies employed in K-pop songs that achieve global chart success, with a focus on the role of code-switching and English lyric usage. A dataset of K-pop songs that appeared on the Billboard Hot 100 and Global 200 charts from 2017 to 2025, spanning 14 groups and 8 solo artists, was compiled. Using this dataset, the proportion of English and Korean lyrics, the frequency of code-switching, and other stylistic features were analysed. It was found that English dominates the linguistic landscape of globally charting K-pop songs, with both male and female performers exhibiting high degrees of code-switching and English usage. Statistical tests indicated no significant gender-based differences, although female solo artists tend to favour English more consistently. A classification task was also performed to predict performer gender from lyrics, achieving macro F1 scores up to 0.76 using multilingual embeddings and handcrafted features. Finally, differences between songs charting on the Hot 100 versus the Global 200 were examined, suggesting that, while there is no significant gender difference in English, higher English usage may be more critical for success in the US-focused Hot 100. The findings highlight how linguistic choices in K-pop lyrics are shaped by global market pressures and reveal stylistic patterns that reflect performer identity and chart context.
Related papers
- Unveiling the Listener Structure Underlying K-pop's Global Success: A Large-Scale Listening Data Analysis [3.966519779235704]
K-pop experienced a significant increase in plays between 2005 and 2019.<n>The Gini coefficient in play counts is notably greater than that of existing mainstream genres.<n>Between 2005 and 2010, K-pop shed its status as a local Asian genre and established itself as a distinct music genre in its own right.
arXiv Detail & Related papers (2025-09-08T12:21:15Z) - POLYCHARTQA: Benchmarking Large Vision-Language Models with Multilingual Chart Question Answering [69.52231076699756]
PolyChartQA is the first large-scale multilingual chart question answering benchmark covering 22,606 charts and 26,151 question-answering pairs across 10 diverse languages.<n>We leverage state-of-the-art LLM-based translation and enforce rigorous quality control in the pipeline to ensure the linguistic and semantic consistency of the generated multilingual charts.
arXiv Detail & Related papers (2025-07-16T06:09:02Z) - EuroGEST: Investigating gender stereotypes in multilingual language models [58.871032460235575]
We introduce EuroGEST, a dataset designed to measure gender-stereotypical reasoning in LLMs across English and 29 European languages.<n>We show that the strongest stereotypes in all models across all languages are that women are 'beautiful', 'empathetic' and 'neat' and men are 'leaders','strong, tough' and 'professional'
arXiv Detail & Related papers (2025-06-04T11:58:18Z) - Tuning Into Bias: A Computational Study of Gender Bias in Song Lyrics [1.5379084885764847]
This paper presents an analysis of gender bias in English song lyrics using topic modeling and bias measurement techniques.<n>We cluster a dataset of 537,553 English songs into distinct topics and analyze their temporal evolution.<n>Our results reveal a significant thematic shift in song lyrics over time, transitioning from romantic themes to a heightened focus on the sexualization of women.
arXiv Detail & Related papers (2024-09-24T10:24:53Z) - 'Since Lawyers are Males..': Examining Implicit Gender Bias in Hindi Language Generation by LLMs [4.021517742561241]
This study explores implicit gender biases in Hindi text generation and compares them to those in English.
Our results reveal a significant gender bias of 87.8% in Hindi, compared to 33.4% in English GPT-4o generation.
This research underscores the variation in gender biases across languages and provides considerations for navigating these biases in generative AI systems.
arXiv Detail & Related papers (2024-09-20T13:16:58Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.<n>We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.<n>Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - K-pop Lyric Translation: Dataset, Analysis, and Neural-Modelling [7.819710421921816]
We introduce a novel singable lyric translation dataset, approximately 89% of which consists of K-pop song lyrics.
This dataset aligns Korean and English lyrics line-by-line and section-by-section.
We construct a neural lyric translation model, thereby underscoring the importance of a dedicated dataset for singable lyric translations.
arXiv Detail & Related papers (2023-09-20T06:54:55Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - Large scale analysis of gender bias and sexism in song lyrics [3.437656066916039]
We identify sexist lyrics at a larger scale than previous studies using small samples of manually annotated popular songs.
We find sexist content to increase across time, especially from male artists and for popular songs appearing in Billboard charts.
This is the first large scale analysis of this type, giving insights into language usage in such an influential part of popular culture.
arXiv Detail & Related papers (2022-08-03T13:18:42Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.