Depthwise Separable Convolutions Versus Recurrent Neural Networks for
Monaural Singing Voice Separation
- URL: http://arxiv.org/abs/2007.02683v1
- Date: Mon, 6 Jul 2020 12:32:34 GMT
- Title: Depthwise Separable Convolutions Versus Recurrent Neural Networks for
Monaural Singing Voice Separation
- Authors: Pyry Pyykk\"onen and Styliannos I. Mimilakis and Konstantinos Drossos
and Tuomas Virtanen
- Abstract summary: We focus on singing voice separation, employing an RNN architecture, and we replace the RNNs with DWS convolutions (DWS-CNNs)
We conduct an ablation study and examine the effect of the number of channels and layers of DWS-CNNs on the source separation performance.
Our results show that by replacing RNNs with DWS-CNNs yields an improvement of 1.20, 0.06, 0.37 dB, respectively, while using only 20.57% of the amount of parameters of the RNN architecture.
- Score: 17.358040670413505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent approaches for music source separation are almost exclusively based on
deep neural networks, mostly employing recurrent neural networks (RNNs).
Although RNNs are in many cases superior than other types of deep neural
networks for sequence processing, they are known to have specific difficulties
in training and parallelization, especially for the typically long sequences
encountered in music source separation. In this paper we present a use-case of
replacing RNNs with depth-wise separable (DWS) convolutions, which are a
lightweight and faster variant of the typical convolutions. We focus on singing
voice separation, employing an RNN architecture, and we replace the RNNs with
DWS convolutions (DWS-CNNs). We conduct an ablation study and examine the
effect of the number of channels and layers of DWS-CNNs on the source
separation performance, by utilizing the standard metrics of
signal-to-artifacts, signal-to-interference, and signal-to-distortion ratio.
Our results show that by replacing RNNs with DWS-CNNs yields an improvement of
1.20, 0.06, 0.37 dB, respectively, while using only 20.57% of the amount of
parameters of the RNN architecture.
Related papers
- Investigating Sparsity in Recurrent Neural Networks [0.0]
This thesis focuses on investigating the effects of pruning and Sparse Recurrent Neural Networks on the performance of RNNs.
We first describe the pruning of RNNs, its impact on the performance of RNNs, and the number of training epochs required to regain accuracy after the pruning is performed.
Next, we continue with the creation and training of Sparse Recurrent Neural Networks and identify the relation between the performance and the graph properties of its underlying arbitrary structure.
arXiv Detail & Related papers (2024-07-30T07:24:58Z) - Accurate Mapping of RNNs on Neuromorphic Hardware with Adaptive Spiking Neurons [2.9410174624086025]
We present a $SigmaDelta$-low-pass RNN (lpRNN) for mapping rate-based RNNs to spiking neural networks (SNNs)
An adaptive spiking neuron model encodes signals using $SigmaDelta$-modulation and enables precise mapping.
We demonstrate the implementation of the lpRNN on Intel's neuromorphic research chip Loihi.
arXiv Detail & Related papers (2024-07-18T14:06:07Z) - Spiking Neural Network Decision Feedback Equalization [70.3497683558609]
We propose an SNN-based equalizer with a feedback structure akin to the decision feedback equalizer (DFE)
We show that our approach clearly outperforms conventional linear equalizers for three different exemplary channels.
The proposed SNN with a decision feedback structure enables the path to competitive energy-efficient transceivers.
arXiv Detail & Related papers (2022-11-09T09:19:15Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on
Riemannian Gradient Descent With Illustrations of Speech Processing [74.31472195046099]
We exploit a low-rank tensor-train deep neural network (TT-DNN) to build an end-to-end deep learning pipeline, namely LR-TT-DNN.
A hybrid model combining LR-TT-DNN with a convolutional neural network (CNN) is set up to boost the performance.
Our empirical evidence demonstrates that the LR-TT-DNN and CNN+(LR-TT-DNN) models with fewer model parameters can outperform the TT-DNN and CNN+(LR-TT-DNN) counterparts.
arXiv Detail & Related papers (2022-03-11T15:55:34Z) - Optimal Conversion of Conventional Artificial Neural Networks to Spiking
Neural Networks [0.0]
Spiking neural networks (SNNs) are biology-inspired artificial neural networks (ANNs)
We propose a novel strategic pipeline that transfers the weights to the target SNN by combining threshold balance and soft-reset mechanisms.
Our method is promising to get implanted onto embedded platforms with better support of SNNs with limited energy and memory.
arXiv Detail & Related papers (2021-02-28T12:04:22Z) - Deep Networks for Direction-of-Arrival Estimation in Low SNR [89.45026632977456]
We introduce a Convolutional Neural Network (CNN) that is trained from mutli-channel data of the true array manifold matrix.
We train a CNN in the low-SNR regime to predict DoAs across all SNRs.
Our robust solution can be applied in several fields, ranging from wireless array sensors to acoustic microphones or sonars.
arXiv Detail & Related papers (2020-11-17T12:52:18Z) - Deep Time Delay Neural Network for Speech Enhancement with Full Data
Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.
To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - Exploring Pre-training with Alignments for RNN Transducer based
End-to-End Speech Recognition [39.497407288772386]
recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research.
In this work, we leverage external alignments to seed the RNN-T model.
Two different pre-training solutions are explored, referred to as encoder pre-training, and whole-network pre-training respectively.
arXiv Detail & Related papers (2020-05-01T19:00:57Z) - Sound Event Detection with Depthwise Separable and Dilated Convolutions [23.104644393058123]
State-of-the-art sound event detection (SED) methods usually employ a series of convolutional neural networks (CNNs) to extract useful features from the input audio signal.
We propose the replacement of the CNNs with depthwise separable convolutions and the replacement of the RNNs with dilated convolutions.
We achieve a reduction of the amount of parameters by 85% and average training time per epoch by 78%, and an increase the average frame-wise F1 score and reduction of the average error rate by 4.6% and 3.8%, respectively.
arXiv Detail & Related papers (2020-02-02T19:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.