Related papers: Towards Leveraging Contrastively Pretrained Neural Audio Embeddings for Recommender Tasks

Towards Leveraging Contrastively Pretrained Neural Audio Embeddings for Recommender Tasks

URL: http://arxiv.org/abs/2409.09026v1
Date: Fri, 13 Sep 2024 17:53:06 GMT
Title: Towards Leveraging Contrastively Pretrained Neural Audio Embeddings for Recommender Tasks
Authors: Florian Grötschla, Luca Strässle, Luca A. Lanzendörfer, Roger Wattenhofer,
Abstract summary: Music recommender systems frequently utilize network-based models to capture relationships between music pieces, artists, and users. New music pieces or artists often face the cold-start problem due to insufficient initial information. To address this, one can extract content-based information directly from the music to enhance collaborative-filtering-based methods.
Score: 18.95453617434051
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Music recommender systems frequently utilize network-based models to capture relationships between music pieces, artists, and users. Although these relationships provide valuable insights for predictions, new music pieces or artists often face the cold-start problem due to insufficient initial information. To address this, one can extract content-based information directly from the music to enhance collaborative-filtering-based methods. While previous approaches have relied on hand-crafted audio features for this purpose, we explore the use of contrastively pretrained neural audio embedding models, which offer a richer and more nuanced representation of music. Our experiments demonstrate that neural embeddings, particularly those generated with the Contrastive Language-Audio Pretraining (CLAP) model, present a promising approach to enhancing music recommendation tasks within graph-based frameworks.

Related papers

"Beyond the past": Leveraging Audio and Human Memory for Sequential Music Recommendation [6.875744149600454]
On music streaming services, listening sessions are often composed of a balance of familiar and new tracks.<n>We propose a model that leverages audio information to predict in advance the activation of new tracks.
arXiv Detail & Related papers (2025-07-23T09:37:23Z)
Bridging the Gap Between Semantic and User Preference Spaces for Multi-modal Music Representation Learning [10.558648773612191]
We propose a novel Hierarchical Two-stage Contrastive Learning (HTCL) method that models similarity from the semantic perspective to the user perspective hierarchically.<n>We devise a scalable audio encoder and leverage a pre-trained BERT model as the text encoder to learn audio-text semantics via large-scale contrastive pre-training.
arXiv Detail & Related papers (2025-05-29T09:50:07Z)
Comparative Analysis of Pretrained Audio Representations in Music Recommender Systems [0.0]
Music Information Retrieval (MIR) has proposed various models pretrained on large amounts of music data. transfer learning showcases the proven effectiveness of pretrained backend models with a broad spectrum of downstream tasks. Music Recommender Systems tend to favour traditional end-to-end neural network learning over pretrained models.
arXiv Detail & Related papers (2024-09-13T17:03:56Z)
Enhancing Sequential Music Recommendation with Personalized Popularity Awareness [56.972624411205224]
This paper introduces a novel approach that incorporates personalized popularity information into sequential recommendation. Experimental results demonstrate that a Personalized Most Popular recommender outperforms existing state-of-the-art models.
arXiv Detail & Related papers (2024-09-06T15:05:12Z)
LARP: Language Audio Relational Pre-training for Cold-Start Playlist Continuation [49.89372182441713]
We introduce LARP, a multi-modal cold-start playlist continuation model. Our framework uses increasing stages of task-specific abstraction: within-track (language-audio) contrastive loss, track-track contrastive loss, and track-playlist contrastive loss.
arXiv Detail & Related papers (2024-06-20T14:02:15Z)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations. We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z)
A Survey of Music Generation in the Context of Interaction [3.6522809408725223]
Machine learning has been successfully used to compose and generate music, both melodies and polyphonic pieces. Most of these models are not suitable for human-machine co-creation through live interaction.
arXiv Detail & Related papers (2024-02-23T12:41:44Z)
MusicRL: Aligning Music Generation to Human Preferences [62.44903326718772]
MusicRL is the first music generation system finetuned from human feedback. We deploy MusicLM to users and collect a substantial dataset comprising 300,000 pairwise preferences. We train MusicRL-U, the first text-to-music model that incorporates human feedback at scale.
arXiv Detail & Related papers (2024-02-06T18:36:52Z)
Self-Supervised Contrastive Learning for Robust Audio-Sheet Music Retrieval Systems [3.997809845676912]
We show that self-supervised contrastive learning can mitigate the scarcity of annotated data from real music content. We employ the snippet embeddings in the higher-level task of cross-modal piece identification. In this work, we observe that the retrieval quality improves from 30% up to 100% when real music data is present.
arXiv Detail & Related papers (2023-09-21T14:54:48Z)
GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music. We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks'' GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time. Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z)
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions. We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation. Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z)
Comparision Of Adversarial And Non-Adversarial LSTM Music Generative Models [2.569647910019739]
This work implements and compares adversarial and non-adversarial training of recurrent neural network music composers on MIDI data. The evaluation indicates that adversarial training produces more aesthetically pleasing music.
arXiv Detail & Related papers (2022-11-01T20:23:49Z)
Detecting Generic Music Features with Single Layer Feedforward Network using Unsupervised Hebbian Computation [3.8707695363745223]
The authors extract information on such features from a popular open-source music corpus. They apply unsupervised Hebbian learning techniques on their single-layer neural network using the same dataset. The unsupervised training algorithm enhances their proposed neural network to achieve an accuracy of 90.36% for successful music feature detection.
arXiv Detail & Related papers (2020-08-31T13:57:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.