MATT: A Multiple-instance Attention Mechanism for Long-tail Music Genre
Classification
- URL: http://arxiv.org/abs/2209.04109v1
- Date: Fri, 9 Sep 2022 03:52:44 GMT
- Title: MATT: A Multiple-instance Attention Mechanism for Long-tail Music Genre
Classification
- Authors: Xiaokai Liu, Menghua Zhang
- Abstract summary: Imbalanced music genre classification is a crucial task in the Music Information Retrieval (MIR) field.
Most of the existing models are designed for class-balanced music datasets.
We propose a novel mechanism named Multi-instance Attention (MATT) to boost the performance for identifying tail classes.
- Score: 1.8275108630751844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imbalanced music genre classification is a crucial task in the Music
Information Retrieval (MIR) field for identifying the long-tail, data-poor
genre based on the related music audio segments, which is very prevalent in
real-world scenarios. Most of the existing models are designed for
class-balanced music datasets, resulting in poor performance in accuracy and
generalization when identifying the music genres at the tail of the
distribution. Inspired by the success of introducing Multi-instance Learning
(MIL) in various classification tasks, we propose a novel mechanism named
Multi-instance Attention (MATT) to boost the performance for identifying tail
classes. Specifically, we first construct the bag-level datasets by generating
the album-artist pair bags. Second, we leverage neural networks to encode the
music audio segments. Finally, under the guidance of a multi-instance attention
mechanism, the neural network-based models could select the most informative
genre to match the given music segment. Comprehensive experimental results on a
large-scale music genre benchmark dataset with long-tail distribution
demonstrate MATT significantly outperforms other state-of-the-art baselines.
Related papers
- Audio Processing using Pattern Recognition for Music Genre Classification [0.0]
This project explores the application of machine learning techniques for music genre classification using the GTZAN dataset.
Motivated by the growing demand for personalized music recommendations, we focused on classifying five genres-Blues, Classical, Jazz, Hip Hop, and Country.
The ANN model demonstrated the best performance, achieving a validation accuracy of 92.44%.
arXiv Detail & Related papers (2024-10-19T05:44:05Z) - Benchmarking Sub-Genre Classification For Mainstage Dance Music [6.042939894766715]
This work introduces a novel benchmark comprising a new dataset and a baseline.
Our dataset extends the number of sub-genres to cover most recent mainstage live sets by top DJs worldwide in music festivals.
For the baseline, we developed deep learning models that outperform current state-of-the-art multimodel language models.
arXiv Detail & Related papers (2024-09-10T17:54:00Z) - Foundation Models for Music: A Survey [77.77088584651268]
Foundations models (FMs) have profoundly impacted diverse sectors, including music.
This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music.
arXiv Detail & Related papers (2024-08-26T15:13:14Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Music Genre Classification with ResNet and Bi-GRU Using Visual
Spectrograms [4.354842354272412]
The limitations of manual genre classification have highlighted the need for a more advanced system.
Traditional machine learning techniques have shown potential in genre classification, but fail to capture the full complexity of music data.
This study proposes a novel approach using visual spectrograms as input, and propose a hybrid model that combines the strength of the Residual neural Network (ResNet) and the Gated Recurrent Unit (GRU)
arXiv Detail & Related papers (2023-07-20T11:10:06Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Multi-task Learning with Metadata for Music Mood Classification [0.0]
Mood recognition is an important problem in music informatics and has key applications in music discovery and recommendation.
We propose a multi-task learning approach in which a shared model is simultaneously trained for mood and metadata prediction tasks.
Applying our technique on the existing state-of-the-art convolutional neural networks for mood classification improves their performances consistently.
arXiv Detail & Related papers (2021-10-10T11:36:34Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.