Contrastive Learning with Positive-Negative Frame Mask for Music
Representation
- URL: http://arxiv.org/abs/2203.09129v1
- Date: Thu, 17 Mar 2022 07:11:42 GMT
- Title: Contrastive Learning with Positive-Negative Frame Mask for Music
Representation
- Authors: Dong Yao, Zhou Zhao, Shengyu Zhang, Jieming Zhu, Yudong Zhu, Rui
Zhang, Xiuqiang He
- Abstract summary: This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR.
We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
- Score: 91.44187939465948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning, especially contrastive learning, has made an
outstanding contribution to the development of many deep learning research
fields. Recently, researchers in the acoustic signal processing field noticed
its success and leveraged contrastive learning for better music representation.
Typically, existing approaches maximize the similarity between two distorted
audio segments sampled from the same music. In other words, they ensure a
semantic agreement at the music level. However, those coarse-grained methods
neglect some inessential or noisy elements at the frame level, which may be
detrimental to the model to learn the effective representation of music.
Towards this end, this paper proposes a novel Positive-nEgative frame mask for
Music Representation based on the contrastive learning framework, abbreviated
as PEMR. Concretely, PEMR incorporates a Positive-Negative Mask Generation
module, which leverages transformer blocks to generate frame masks on the
Log-Mel spectrogram. We can generate self-augmented negative and positive
samples by masking important components or inessential components,
respectively. We devise a novel contrastive learning objective to accommodate
both self-augmented positives/negatives sampled from the same music. We conduct
experiments on four public datasets. The experimental results of two
music-related downstream tasks, music classification, and cover song
identification, demonstrate the generalization ability and transferability of
music representation learned by PEMR.
Related papers
- Semi-Supervised Self-Learning Enhanced Music Emotion Recognition [6.315220462630698]
Music emotion recognition (MER) aims to identify the emotions conveyed in a given musical piece.
Currently, the available public datasets have limited sample sizes.
We propose a semi-supervised self-learning (SSSL) method, which can differentiate between samples with correct and incorrect labels in a self-learning manner.
arXiv Detail & Related papers (2024-10-29T09:42:07Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - Impact of time and note duration tokenizations on deep learning symbolic
music modeling [0.0]
We analyze the common tokenization methods and experiment with time and note duration representations.
We demonstrate that explicit information leads to better results depending on the task.
arXiv Detail & Related papers (2023-10-12T16:56:37Z) - Towards Contrastive Learning in Music Video Domain [46.29203572184694]
We create a dual en-coder for the audio and video modalities and train it using a bidirectional contrastive loss.
For the experiments, we use an industry dataset containing 550 000 music videos as well as the public Million Song dataset.
Our results indicate that pre-trained networks without contrastive fine-tuning outperform our contrastive learning approach when evaluated on both tasks.
arXiv Detail & Related papers (2023-09-01T09:08:21Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Audio-Visual Instance Discrimination with Cross-Modal Agreement [90.95132499006498]
We present a self-supervised learning approach to learn audio-visual representations from video and audio.
We show that optimizing for cross-modal discrimination, rather than within-modal discrimination, is important to learn good representations from video and audio.
arXiv Detail & Related papers (2020-04-27T16:59:49Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.