Supervised Chorus Detection for Popular Music Using Convolutional Neural
Network and Multi-task Learning
- URL: http://arxiv.org/abs/2103.14253v1
- Date: Fri, 26 Mar 2021 04:32:08 GMT
- Title: Supervised Chorus Detection for Popular Music Using Convolutional Neural
Network and Multi-task Learning
- Authors: Ju-Chiang Wang, Jordan B.L. Smith, Jitong Chen, Xuchen Song, Yuxuan
Wang
- Abstract summary: This paper presents a novel supervised approach to detecting the chorus segments in popular music.
We propose a convolutional neural network with a multi-task learning objective, which simultaneously fits two temporal activation curves.
We also propose a post-processing method that jointly takes into account the chorus and boundary predictions to produce binary output.
- Score: 10.160205869706965
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a novel supervised approach to detecting the chorus
segments in popular music. Traditional approaches to this task are mostly
unsupervised, with pipelines designed to target some quality that is assumed to
define "chorusness," which usually means seeking the loudest or most frequently
repeated sections. We propose to use a convolutional neural network with a
multi-task learning objective, which simultaneously fits two temporal
activation curves: one indicating "chorusness" as a function of time, and the
other the location of the boundaries. We also propose a post-processing method
that jointly takes into account the chorus and boundary predictions to produce
binary output. In experiments using three datasets, we compare our system to a
set of public implementations of other segmentation and chorus-detection
algorithms, and find our approach performs significantly better.
Related papers
- Carnatic Raga Identification System using Rigorous Time-Delay Neural Network [0.0]
Large scale machine learning-based Raga identification continues to be a nontrivial issue in the computational aspects behind Carnatic music.
In this paper, the input sound is analyzed using a combination of steps including using a Discrete Fourier transformation and using Triangular Filtering to create custom bins of possible notes.
The goal of this program is to be able to effectively and efficiently label a much wider range of audio clips in more shrutis, ragas, and with more background noise.
arXiv Detail & Related papers (2024-05-25T01:31:58Z) - CoverHunter: Cover Song Identification with Refined Attention and
Alignments [19.173689175634106]
Cover song identification (CSI) focuses on finding the same music with different versions in reference anchors given a query track.
We propose a novel system named CoverHunter that overcomes the shortcomings of existing detection schemes.
arXiv Detail & Related papers (2023-06-15T10:34:20Z) - Improving Time Series Encoding with Noise-Aware Self-Supervised Learning and an Efficient Encoder [15.39384259348351]
We propose an innovative training strategy that promotes consistent representation learning, accounting for the presence of noise-prone signals in natural time series.
We also propose an encoder architecture that incorporates dilated convolution within the Inception block, resulting in a scalable and robust network with a wide receptive field.
arXiv Detail & Related papers (2023-06-11T04:00:11Z) - A Multi-Task Deep Learning Approach for Sensor-based Human Activity
Recognition and Segmentation [4.987833356397567]
We propose a new deep neural network to solve the two tasks simultaneously.
The proposed network adopts selective convolution and features multiscale windows to segment activities of long or short time durations.
Our proposed method outperforms the state-of-the-art methods both for activity recognition and segmentation.
arXiv Detail & Related papers (2023-03-20T13:34:28Z) - Generalizing Face Forgery Detection with High-frequency Features [63.33397573649408]
Current CNN-based detectors tend to overfit to method-specific color textures and thus fail to generalize.
We propose to utilize the high-frequency noises for face forgery detection.
The first is the multi-scale high-frequency feature extraction module that extracts high-frequency noises at multiple scales.
The second is the residual-guided spatial attention module that guides the low-level RGB feature extractor to concentrate more on forgery traces from a new perspective.
arXiv Detail & Related papers (2021-03-23T08:19:21Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - Collaborative Distillation in the Parameter and Spectrum Domains for
Video Action Recognition [79.60708268515293]
This paper explores how to train small and efficient networks for action recognition.
We propose two distillation strategies in the frequency domain, namely the feature spectrum and parameter distribution distillations respectively.
Our method can achieve higher performance than state-of-the-art methods with the same backbone.
arXiv Detail & Related papers (2020-09-15T07:29:57Z) - Detecting Generic Music Features with Single Layer Feedforward Network
using Unsupervised Hebbian Computation [3.8707695363745223]
The authors extract information on such features from a popular open-source music corpus.
They apply unsupervised Hebbian learning techniques on their single-layer neural network using the same dataset.
The unsupervised training algorithm enhances their proposed neural network to achieve an accuracy of 90.36% for successful music feature detection.
arXiv Detail & Related papers (2020-08-31T13:57:31Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z) - MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT
Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate.
We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network.
Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z) - Learning multiview 3D point cloud registration [74.39499501822682]
We present a novel, end-to-end learnable, multiview 3D point cloud registration algorithm.
Our approach outperforms the state-of-the-art by a significant margin, while being end-to-end trainable and computationally less costly.
arXiv Detail & Related papers (2020-01-15T03:42:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.