Fast accuracy estimation of deep learning based multi-class musical
source separation
- URL: http://arxiv.org/abs/2010.09453v3
- Date: Wed, 1 Dec 2021 07:55:09 GMT
- Title: Fast accuracy estimation of deep learning based multi-class musical
source separation
- Authors: Alexandru Mocanu, Benjamin Ricaud, Milos Cernak
- Abstract summary: We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
- Score: 79.10962538141445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Music source separation represents the task of extracting all the instruments
from a given song. Recent breakthroughs on this challenge have gravitated
around a single dataset, MUSDB, only limited to four instrument classes. Larger
datasets and more instruments are costly and time-consuming in collecting data
and training deep neural networks (DNNs). In this work, we propose a fast
method to evaluate the separability of instruments in any dataset without
training and tuning a DNN. This separability measure helps to select
appropriate samples for the efficient training of neural networks. Based on the
oracle principle with an ideal ratio mask, our approach is an excellent proxy
to estimate the separation performances of state-of-the-art deep learning
approaches such as TasNet or Open-Unmix. Our results contribute to revealing
two essential points for audio source separation: 1) the ideal ratio mask,
although light and straightforward, provides an accurate measure of the audio
separability performance of recent neural nets, and 2) new end-to-end learning
methods such as Tasnet, that operate directly on waveforms, are, in fact,
internally building a Time-Frequency (TF) representation, so that they
encounter the same limitations as the TF based-methods when separating audio
pattern overlapping in the TF plane.
Related papers
- Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - Learning from Data with Noisy Labels Using Temporal Self-Ensemble [11.245833546360386]
Deep neural networks (DNNs) have an enormous capacity to memorize noisy labels.
Current state-of-the-art methods present a co-training scheme that trains dual networks using samples associated with small losses.
We propose a simple yet effective robust training scheme that operates by training only a single network.
arXiv Detail & Related papers (2022-07-21T08:16:31Z) - Unsupervised Audio Source Separation Using Differentiable Parametric
Source Models [8.80867379881193]
We propose an unsupervised model-based deep learning approach to musical source separation.
A neural network is trained to reconstruct the observed mixture as a sum of the sources.
The experimental evaluation on a vocal ensemble separation task shows that the proposed method outperforms learning-free methods.
arXiv Detail & Related papers (2022-01-24T11:05:30Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - Training a Deep Neural Network via Policy Gradients for Blind Source
Separation in Polyphonic Music Recordings [1.933681537640272]
We propose a method for the blind separation of sounds of musical instruments in audio signals.
We describe the individual tones via a parametric model, training a dictionary to capture the relative amplitudes of the harmonics.
Our algorithm yields high-quality results with particularly low interference on a variety of different audio samples.
arXiv Detail & Related papers (2021-07-09T06:17:04Z) - Broadcasted Residual Learning for Efficient Keyword Spotting [7.335747584353902]
We present a broadcasted residual learning method to achieve high accuracy with small model size and computational load.
We also propose a novel network architecture, Broadcasting-residual network (BC-ResNet), based on broadcasted residual learning.
BC-ResNets achieve state-of-the-art 98.0% and 98.7% top-1 accuracy on Google speech command datasets v1 and v2, respectively.
arXiv Detail & Related papers (2021-06-08T06:55:39Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Deep Convolutional and Recurrent Networks for Polyphonic Instrument
Classification from Monophonic Raw Audio Waveforms [30.3491261167433]
Sound Event Detection and Audio Classification tasks are traditionally addressed through time-frequency representations of audio signals such as spectrograms.
Deep neural networks as efficient feature extractors has enabled the direct use of audio signals for classification purposes.
We attempt to recognize musical instruments in polyphonic audio by only feeding their raw waveforms into deep learning models.
arXiv Detail & Related papers (2021-02-13T13:44:46Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT
Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate.
We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network.
Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.