An Analysis of Classification Approaches for Hit Song Prediction using
Engineered Metadata Features with Lyrics and Audio Features
- URL: http://arxiv.org/abs/2301.13507v1
- Date: Tue, 31 Jan 2023 09:48:53 GMT
- Title: An Analysis of Classification Approaches for Hit Song Prediction using
Engineered Metadata Features with Lyrics and Audio Features
- Authors: Mengyisong Zhao, Morgan Harvey, David Cameron, Frank Hopfgartner and
Valerie J. Gillet
- Abstract summary: This study aims to improve the prediction result of the top 10 hits among Billboard Hot 100 songs using more alternative metadata.
Five machine learning approaches are applied, including: k-nearest neighbours, Naive Bayes, Random Forest, Logistic Regression and Multilayer Perceptron.
Our results show that Random Forest (RF) and Logistic Regression (LR) with all features outperforms other models, achieving 89.1% and 87.2% accuracy, and 0.91 and 0.93 AUC, respectively.
- Score: 5.871032585001082
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hit song prediction, one of the emerging fields in music information
retrieval (MIR), remains a considerable challenge. Being able to understand
what makes a given song a hit is clearly beneficial to the whole music
industry. Previous approaches to hit song prediction have focused on using
audio features of a record. This study aims to improve the prediction result of
the top 10 hits among Billboard Hot 100 songs using more alternative metadata,
including song audio features provided by Spotify, song lyrics, and novel
metadata-based features (title topic, popularity continuity and genre class).
Five machine learning approaches are applied, including: k-nearest neighbours,
Naive Bayes, Random Forest, Logistic Regression and Multilayer Perceptron. Our
results show that Random Forest (RF) and Logistic Regression (LR) with all
features (including novel features, song audio features and lyrics features)
outperforms other models, achieving 89.1% and 87.2% accuracy, and 0.91 and 0.93
AUC, respectively. Our findings also demonstrate the utility of our novel music
metadata features, which contributed most to the models' discriminative
performance.
Related papers
- Music Era Recognition Using Supervised Contrastive Learning and Artist Information [11.126020721501956]
Music era information can be an important feature for playlist generation and recommendation.
An audio-based model is developed to predict the era from audio.
For the case where the artist information is available, we extend the audio-based model to take multimodal inputs and develop a framework, called MultiModal Contrastive (MMC) learning, to enhance the training.
arXiv Detail & Related papers (2024-07-07T13:43:55Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Beyond Beats: A Recipe to Song Popularity? A machine learning approach [2.6422127672474933]
This study aims to explore the predictive power of various machine learning models in forecasting song popularity.
We employ Ordinary Least Squares (OLS) regression analysis to analyse song characteristics and their impact on popularity.
Random Forest emerges as the most effective model, improving prediction accuracy by 7.1% compared to average scores.
arXiv Detail & Related papers (2024-03-01T17:14:41Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z) - SpotHitPy: A Study For ML-Based Song Hit Prediction Using Spotify [0.0]
We gathered a dataset of nearly 18500 hit and non-hit songs.
We extracted their audio features using the Spotify Web API.
We were able to predict the Billboard success of a song with approximately 86% accuracy.
arXiv Detail & Related papers (2023-01-19T10:13:52Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Review-Based Tip Generation for Music Songs [15.318127987849092]
We propose a framework named GenTMS for automatically generating tips from song reviews.
The dataset involves 8,003 Chinese tips/non-tips from 128 songs.
Experiments show that GenTMS achieves top-10 precision at 85.56%, outperforming the baseline models by at least 3.34%.
arXiv Detail & Related papers (2022-05-14T06:40:49Z) - Context-Based Music Recommendation Algorithm Evaluation [0.0]
This paper explores 6 machine learning algorithms and their individual accuracy for predicting whether a user will like a song.
The algorithms explored include Logistic Regression, Naive Bayes, Sequential Minimal Optimization (SMO), Multilayer Perceptron (Neural Network), Nearest Neighbor, and Random Forest.
With the analysis of the specific characteristics of each song provided by the Spotify API, Random Forest is the most successful algorithm for predicting whether a user will like a song with an accuracy of 84%.
arXiv Detail & Related papers (2021-12-16T01:46:36Z) - MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training [97.91071692716406]
Symbolic music understanding refers to the understanding of music from the symbolic data.
MusicBERT is a large-scale pre-trained model for music understanding.
arXiv Detail & Related papers (2021-06-10T10:13:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.