Music Genre Classification: A Comparative Analysis of CNN and XGBoost
Approaches with Mel-frequency cepstral coefficients and Mel Spectrograms
- URL: http://arxiv.org/abs/2401.04737v1
- Date: Tue, 9 Jan 2024 01:50:31 GMT
- Title: Music Genre Classification: A Comparative Analysis of CNN and XGBoost
Approaches with Mel-frequency cepstral coefficients and Mel Spectrograms
- Authors: Yigang Meng
- Abstract summary: This study investigates the performances of three models: a proposed convolutional neural network (CNN), the VGG16 with fully connected layers (FC), and an eXtreme Gradient Boosting (XGBoost) approach on different features.
The results show that the MFCC XGBoost model outperformed the others. Furthermore, applying data segmentation in the data preprocessing phase can significantly enhance the performance of the CNNs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, various well-designed algorithms have empowered music
platforms to provide content based on one's preferences. Music genres are
defined through various aspects, including acoustic features and cultural
considerations. Music genre classification works well with content-based
filtering, which recommends content based on music similarity to users. Given a
considerable dataset, one premise is automatic annotation using machine
learning or deep learning methods that can effectively classify audio files.
The effectiveness of systems largely depends on feature and model selection, as
different architectures and features can facilitate each other and yield
different results. In this study, we conduct a comparative study investigating
the performances of three models: a proposed convolutional neural network
(CNN), the VGG16 with fully connected layers (FC), and an eXtreme Gradient
Boosting (XGBoost) approach on different features: 30-second Mel spectrogram
and 3-second Mel-frequency cepstral coefficients (MFCCs). The results show that
the MFCC XGBoost model outperformed the others. Furthermore, applying data
segmentation in the data preprocessing phase can significantly enhance the
performance of the CNNs.
Related papers
- Machine Learning Framework for Audio-Based Content Evaluation using MFCC, Chroma, Spectral Contrast, and Temporal Feature Engineering [0.0]
We construct a dataset containing audio samples from music covers on YouTube along with the audio of the original song, and sentiment scores derived from user comments.
Our approach involves extensive pre-processing, segmenting audio signals into 30-second windows, and extracting high-dimensional feature representations.
We train regression models to predict sentiment scores on a 0-100 scale, achieving root mean square error (RMSE) values of 3.420, 5.482, 2.783, and 4.212, respectively.
arXiv Detail & Related papers (2024-10-31T20:26:26Z) - Audio Processing using Pattern Recognition for Music Genre Classification [0.0]
This project explores the application of machine learning techniques for music genre classification using the GTZAN dataset.
Motivated by the growing demand for personalized music recommendations, we focused on classifying five genres-Blues, Classical, Jazz, Hip Hop, and Country.
The ANN model demonstrated the best performance, achieving a validation accuracy of 92.44%.
arXiv Detail & Related papers (2024-10-19T05:44:05Z) - Music Genre Classification using Large Language Models [50.750620612351284]
This paper exploits the zero-shot capabilities of pre-trained large language models (LLMs) for music genre classification.
The proposed approach splits audio signals into 20 ms chunks and processes them through convolutional feature encoders.
During inference, predictions on individual chunks are aggregated for a final genre classification.
arXiv Detail & Related papers (2024-10-10T19:17:56Z) - Music Genre Classification: Training an AI model [0.0]
Music genre classification is an area that utilizes machine learning models and techniques for the processing of audio signals.
In this research I explore various machine learning algorithms for the purpose of music genre classification, using features extracted from audio signals.
I aim to asses the robustness of machine learning models for genre classification, and to compare their results.
arXiv Detail & Related papers (2024-05-23T23:07:01Z) - DITTO: Diffusion Inference-Time T-Optimization for Music Generation [49.90109850026932]
Diffusion Inference-Time T-Optimization (DITTO) is a frame-work for controlling pre-trained text-to-music diffusion models at inference-time.
We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control.
arXiv Detail & Related papers (2024-01-22T18:10:10Z) - MATT: A Multiple-instance Attention Mechanism for Long-tail Music Genre
Classification [1.8275108630751844]
Imbalanced music genre classification is a crucial task in the Music Information Retrieval (MIR) field.
Most of the existing models are designed for class-balanced music datasets.
We propose a novel mechanism named Multi-instance Attention (MATT) to boost the performance for identifying tail classes.
arXiv Detail & Related papers (2022-09-09T03:52:44Z) - Investigating Multi-Feature Selection and Ensembling for Audio
Classification [0.8602553195689513]
Deep Learning algorithms have shown impressive performance in diverse domains.
Audio has attracted many researchers over the last couple of decades due to some interesting patterns.
For better performance of audio classification, feature selection and combination play a key role.
arXiv Detail & Related papers (2022-06-15T13:11:08Z) - Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated.
We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.