Towards Leveraging Contrastively Pretrained Neural Audio Embeddings for Recommender Tasks
- URL: http://arxiv.org/abs/2409.09026v1
- Date: Fri, 13 Sep 2024 17:53:06 GMT
- Title: Towards Leveraging Contrastively Pretrained Neural Audio Embeddings for Recommender Tasks
- Authors: Florian Grötschla, Luca Strässle, Luca A. Lanzendörfer, Roger Wattenhofer,
- Abstract summary: Music recommender systems frequently utilize network-based models to capture relationships between music pieces, artists, and users.
New music pieces or artists often face the cold-start problem due to insufficient initial information.
To address this, one can extract content-based information directly from the music to enhance collaborative-filtering-based methods.
- Score: 18.95453617434051
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Music recommender systems frequently utilize network-based models to capture relationships between music pieces, artists, and users. Although these relationships provide valuable insights for predictions, new music pieces or artists often face the cold-start problem due to insufficient initial information. To address this, one can extract content-based information directly from the music to enhance collaborative-filtering-based methods. While previous approaches have relied on hand-crafted audio features for this purpose, we explore the use of contrastively pretrained neural audio embedding models, which offer a richer and more nuanced representation of music. Our experiments demonstrate that neural embeddings, particularly those generated with the Contrastive Language-Audio Pretraining (CLAP) model, present a promising approach to enhancing music recommendation tasks within graph-based frameworks.
Related papers
- MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - A Survey of Music Generation in the Context of Interaction [3.6522809408725223]
Machine learning has been successfully used to compose and generate music, both melodies and polyphonic pieces.
Most of these models are not suitable for human-machine co-creation through live interaction.
arXiv Detail & Related papers (2024-02-23T12:41:44Z) - MusicRL: Aligning Music Generation to Human Preferences [62.44903326718772]
MusicRL is the first music generation system finetuned from human feedback.
We deploy MusicLM to users and collect a substantial dataset comprising 300,000 pairwise preferences.
We train MusicRL-U, the first text-to-music model that incorporates human feedback at scale.
arXiv Detail & Related papers (2024-02-06T18:36:52Z) - Self-Supervised Contrastive Learning for Robust Audio-Sheet Music
Retrieval Systems [3.997809845676912]
We show that self-supervised contrastive learning can mitigate the scarcity of annotated data from real music content.
We employ the snippet embeddings in the higher-level task of cross-modal piece identification.
In this work, we observe that the retrieval quality improves from 30% up to 100% when real music data is present.
arXiv Detail & Related papers (2023-09-21T14:54:48Z) - GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Comparision Of Adversarial And Non-Adversarial LSTM Music Generative
Models [2.569647910019739]
This work implements and compares adversarial and non-adversarial training of recurrent neural network music composers on MIDI data.
The evaluation indicates that adversarial training produces more aesthetically pleasing music.
arXiv Detail & Related papers (2022-11-01T20:23:49Z) - MusCaps: Generating Captions for Music Audio [14.335950077921435]
We present the first music audio captioning model, MusCaps, consisting of an encoder-decoder with temporal attention.
Our method combines convolutional and recurrent neural network architectures to jointly process audio-text inputs.
Our model represents a shift away from classification-based music description and combines tasks requiring both auditory and linguistic understanding.
arXiv Detail & Related papers (2021-04-24T16:34:47Z) - Detecting Generic Music Features with Single Layer Feedforward Network
using Unsupervised Hebbian Computation [3.8707695363745223]
The authors extract information on such features from a popular open-source music corpus.
They apply unsupervised Hebbian learning techniques on their single-layer neural network using the same dataset.
The unsupervised training algorithm enhances their proposed neural network to achieve an accuracy of 90.36% for successful music feature detection.
arXiv Detail & Related papers (2020-08-31T13:57:31Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.