Comparative Analysis of Pretrained Audio Representations in Music Recommender Systems
- URL: http://arxiv.org/abs/2409.08987v1
- Date: Fri, 13 Sep 2024 17:03:56 GMT
- Title: Comparative Analysis of Pretrained Audio Representations in Music Recommender Systems
- Authors: Yan-Martin Tamm, Anna Aljanaki,
- Abstract summary: Music Information Retrieval (MIR) has proposed various models pretrained on large amounts of music data.
transfer learning showcases the proven effectiveness of pretrained backend models with a broad spectrum of downstream tasks.
Music Recommender Systems tend to favour traditional end-to-end neural network learning over pretrained models.
- Score: 0.0
- License:
- Abstract: Over the years, Music Information Retrieval (MIR) has proposed various models pretrained on large amounts of music data. Transfer learning showcases the proven effectiveness of pretrained backend models with a broad spectrum of downstream tasks, including auto-tagging and genre classification. However, MIR papers generally do not explore the efficiency of pretrained models for Music Recommender Systems (MRS). In addition, the Recommender Systems community tends to favour traditional end-to-end neural network learning over these models. Our research addresses this gap and evaluates the applicability of six pretrained backend models (MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, and MusiCNN) in the context of MRS. We assess their performance using three recommendation models: K-nearest neighbours (KNN), shallow neural network, and BERT4Rec. Our findings suggest that pretrained audio representations exhibit significant performance variability between traditional MIR tasks and MRS, indicating that valuable aspects of musical information captured by backend models may differ depending on the task. This study establishes a foundation for further exploration of pretrained audio representations to enhance music recommendation systems.
Related papers
- Transfer Learning for Passive Sonar Classification using Pre-trained Audio and ImageNet Models [39.85805843651649]
This study compares pre-trained Audio Neural Networks (PANNs) and ImageNet pre-trained models.
It was observed that the ImageNet pre-trained models slightly out-perform pre-trained audio models in passive sonar classification.
arXiv Detail & Related papers (2024-09-20T20:13:45Z) - Towards Leveraging Contrastively Pretrained Neural Audio Embeddings for Recommender Tasks [18.95453617434051]
Music recommender systems frequently utilize network-based models to capture relationships between music pieces, artists, and users.
New music pieces or artists often face the cold-start problem due to insufficient initial information.
To address this, one can extract content-based information directly from the music to enhance collaborative-filtering-based methods.
arXiv Detail & Related papers (2024-09-13T17:53:06Z) - Music Genre Classification: Training an AI model [0.0]
Music genre classification is an area that utilizes machine learning models and techniques for the processing of audio signals.
In this research I explore various machine learning algorithms for the purpose of music genre classification, using features extracted from audio signals.
I aim to asses the robustness of machine learning models for genre classification, and to compare their results.
arXiv Detail & Related papers (2024-05-23T23:07:01Z) - Curriculum-scheduled Knowledge Distillation from Multiple Pre-trained Teachers for Multi-domain Sequential Recommendation [102.91236882045021]
It is essential to explore how to use different pre-trained recommendation models efficiently in real-world systems.
We propose a novel curriculum-scheduled knowledge distillation from multiple pre-trained teachers for multi-domain sequential recommendation.
CKD-MDSR takes full advantages of different PRMs as multiple teacher models to boost a small student recommendation model.
arXiv Detail & Related papers (2024-01-01T15:57:15Z) - MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.
Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Music Source Separation with Band-split RNN [25.578400006180527]
We propose a frequency-domain model that splits the spectrogram of the mixture into subbands and perform interleaved band-level and sequence-level modeling.
The choices of the bandwidths of the subbands can be determined by a priori knowledge or expert knowledge on the characteristics of the target source.
Experiment results show that BSRNN trained only on MUSDB18-HQ dataset significantly outperforms several top-ranking models in Music Demixing (MDX) Challenge 2021.
arXiv Detail & Related papers (2022-09-30T01:49:52Z) - Codified audio language modeling learns useful representations for music
information retrieval [77.63657430536593]
We show that language models pre-trained on codified (discretely-encoded) music audio learn representations that are useful for downstream MIR tasks.
To determine if Jukebox's representations contain useful information for MIR, we use them as input features to train shallow models on several MIR tasks.
We observe that representations from Jukebox are considerably stronger than those from models pre-trained on tagging, suggesting that pre-training via codified audio language modeling may address blind spots in conventional approaches.
arXiv Detail & Related papers (2021-07-12T18:28:50Z) - A Survey on Neural Recommendation: From Collaborative Filtering to
Content and Context Enriched Recommendation [70.69134448863483]
Research in recommendation has shifted to inventing new recommender models based on neural networks.
In recent years, we have witnessed significant progress in developing neural recommender models.
arXiv Detail & Related papers (2021-04-27T08:03:52Z) - Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated.
We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.