Related papers: Learning to rank music tracks using triplet loss

Learning to rank music tracks using triplet loss

URL: http://arxiv.org/abs/2005.12977v1
Date: Mon, 18 May 2020 08:20:54 GMT
Title: Learning to rank music tracks using triplet loss
Authors: Laure Pr\'etet, Ga\"el Richard, Geoffroy Peeters
Abstract summary: We propose a method for direct recommendation based on the audio content without explicitly tagging the music tracks. We train a Convolutional Neural Network to learn the similarity via triplet loss. Results highlight the efficiency of our system, especially when associated with an Auto-pooling layer.
Score: 6.43271391521664
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most music streaming services rely on automatic recommendation algorithms to exploit their large music catalogs. These algorithms aim at retrieving a ranked list of music tracks based on their similarity with a target music track. In this work, we propose a method for direct recommendation based on the audio content without explicitly tagging the music tracks. To that aim, we propose several strategies to perform triplet mining from ranked lists. We train a Convolutional Neural Network to learn the similarity via triplet loss. These different strategies are compared and validated on a large-scale experiment against an auto-tagging based approach. The results obtained highlight the efficiency of our system, especially when associated with an Auto-pooling layer.

Related papers

Unleashing the Power of Natural Audio Featuring Multiple Sound Sources [54.38251699625379]
Universal sound separation aims to extract clean audio tracks corresponding to distinct events from mixed audio. We propose ClearSep, a framework that employs a data engine to decompose complex naturally mixed audio into multiple independent tracks. In experiments, ClearSep achieves state-of-the-art performance across multiple sound separation tasks.
arXiv Detail & Related papers (2025-04-24T17:58:21Z)
LARP: Language Audio Relational Pre-training for Cold-Start Playlist Continuation [49.89372182441713]
We introduce LARP, a multi-modal cold-start playlist continuation model. Our framework uses increasing stages of task-specific abstraction: within-track (language-audio) contrastive loss, track-track contrastive loss, and track-playlist contrastive loss.
arXiv Detail & Related papers (2024-06-20T14:02:15Z)
Enhancing Music Genre Classification through Multi-Algorithm Analysis and User-Friendly Visualization [0.0]
The aim of this study is to teach an algorithm how to recognize different types of music. Since the algorithm hasn't heard these songs before, it needs to figure out what makes each song unique.
arXiv Detail & Related papers (2024-05-27T17:57:20Z)
DeepSRGM -- Sequence Classification and Ranking in Indian Classical Music with Deep Learning [7.140656816182373]
Raga is a melodic framework for compositions and improvisations alike. Raga Recognition is an important music information retrieval task in Indian Classical Music. We propose a deep learning based approach to Raga recognition.
arXiv Detail & Related papers (2024-02-15T18:11:02Z)
DITTO: Diffusion Inference-Time T-Optimization for Music Generation [49.90109850026932]
Diffusion Inference-Time T-Optimization (DITTO) is a frame-work for controlling pre-trained text-to-music diffusion models at inference-time. We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control.
arXiv Detail & Related papers (2024-01-22T18:10:10Z)
Self-Supervised Contrastive Learning for Robust Audio-Sheet Music Retrieval Systems [3.997809845676912]
We show that self-supervised contrastive learning can mitigate the scarcity of annotated data from real music content. We employ the snippet embeddings in the higher-level task of cross-modal piece identification. In this work, we observe that the retrieval quality improves from 30% up to 100% when real music data is present.
arXiv Detail & Related papers (2023-09-21T14:54:48Z)
Fairness Through Domain Awareness: Mitigating Popularity Bias For Music Discovery [56.77435520571752]
We explore the intrinsic relationship between music discovery and popularity bias. We propose a domain-aware, individual fairness-based approach which addresses popularity bias in graph neural network (GNNs) based recommender systems. Our approach uses individual fairness to reflect a ground truth listening experience, i.e., if two songs sound similar, this similarity should be reflected in their representations.
arXiv Detail & Related papers (2023-08-28T14:12:25Z)
Video-to-Music Recommendation using Temporal Alignment of Segments [5.7235653928654235]
We study cross-modal recommendation of music tracks to be used as soundtracks for videos. We build on a self-supervised system that learns a content association between music and video. We propose a novel approach to significantly improve the system's performance using structure-aware recommendation.
arXiv Detail & Related papers (2023-06-12T15:40:31Z)
GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music. We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks'' GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time. Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z)
Melody transcription via generative pre-training [86.08508957229348]
Key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles. To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio. We derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music.
arXiv Detail & Related papers (2022-12-04T18:09:23Z)
Detecting Generic Music Features with Single Layer Feedforward Network using Unsupervised Hebbian Computation [3.8707695363745223]
The authors extract information on such features from a popular open-source music corpus. They apply unsupervised Hebbian learning techniques on their single-layer neural network using the same dataset. The unsupervised training algorithm enhances their proposed neural network to achieve an accuracy of 90.36% for successful music feature detection.
arXiv Detail & Related papers (2020-08-31T13:57:31Z)
dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains. This will also provide a means for evaluating algorithms specifically designed for music. The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.