Related papers: Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation

Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation

URL: http://arxiv.org/abs/2008.00816v1
Date: Mon, 3 Aug 2020 12:09:42 GMT
Title: Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation
Authors: Weitao Yuan, Bofei Dong, Shengbei Wang, Masashi Unoki, and Wenwu Wang
Abstract summary: Monaural Singing Voice Separation (MSVS) is a challenging task and has been studied for decades. Deep neural networks (DNNs) are the current state-of-the-art methods for MSVS. We introduce a Neural Architecture Search (NAS) method to the structure design of DNNs for MSVS.
Score: 40.170868770930774
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Monaural Singing Voice Separation (MSVS) is a challenging task and has been studied for decades. Deep neural networks (DNNs) are the current state-of-the-art methods for MSVS. However, the existing DNNs are often designed manually, which is time-consuming and error-prone. In addition, the network architectures are usually pre-defined, and not adapted to the training data. To address these issues, we introduce a Neural Architecture Search (NAS) method to the structure design of DNNs for MSVS. Specifically, we propose a new multi-resolution Convolutional Neural Network (CNN) framework for MSVS namely Multi-Resolution Pooling CNN (MRP-CNN), which uses various-size pooling operators to extract multi-resolution features. Based on the NAS, we then develop an evolving framework namely Evolving MRP-CNN (E-MRP-CNN), by automatically searching the effective MRP-CNN structures using genetic algorithms, optimized in terms of a single-objective considering only separation performance, or multi-objective considering both the separation performance and the model complexity. The multi-objective E-MRP-CNN gives a set of Pareto-optimal solutions, each providing a trade-off between separation performance and model complexity. Quantitative and qualitative evaluations on the MIR-1K and DSD100 datasets are used to demonstrate the advantages of the proposed framework over several recent baselines.

Related papers

Heterogeneous Resource Allocation with Multi-task Learning for Wireless Networks [22.52809431518314]
We propose a multi-task learning (MTL) framework to enable a single deep neural network (DNN) to jointly solve a range of diverse optimization problems. In this framework, optimization problems with varying dimensionality values, objectives, and constraints are treated as distinct tasks. Numerical results demonstrate the efficiency of the proposed MTL approach in solving diverse optimization problems.
arXiv Detail & Related papers (2025-02-14T09:13:33Z)
Multiway Multislice PHATE: Visualizing Hidden Dynamics of RNNs through Training [6.326396282553267]
Recurrent neural networks (RNNs) are a widely used tool for sequential data analysis, however, they are still often seen as black boxes of computation. Here, we present Multiway Multislice PHATE (MM-PHATE), a novel method for visualizing the evolution of RNNs' hidden states.
arXiv Detail & Related papers (2024-06-04T05:05:27Z)
Multi-Objective Evolutionary Neural Architecture Search for Recurrent Neural Networks [0.0]
This paper proposes a multi-objective evolutionary algorithm-based RNN architecture search method. The proposed method relies on approximate network morphisms for RNN architecture complexity optimisation during evolution.
arXiv Detail & Related papers (2024-03-17T11:19:45Z)
Applications of Spiking Neural Networks in Visual Place Recognition [19.577433371468533]
Spiking Neural Networks (SNNs) are increasingly recognized for their potential energy efficiency and low latency. This paper highlights three advancements for SNNs in Visual Place Recognition (VPR) Firstly, we propose Modular SNNs, where each SNN represents a set of non-overlapping geographically distinct places. Secondly, we present Ensembles of Modular SNNs, where multiple networks represent the same place. Lastly, we investigate the role of sequence matching in SNN-based VPR, a technique where consecutive images are used to refine place recognition.
arXiv Detail & Related papers (2023-11-22T06:26:24Z)
Disentangling Structured Components: Towards Adaptive, Interpretable and Scalable Time Series Forecasting [52.47493322446537]
We develop a adaptive, interpretable and scalable forecasting framework, which seeks to individually model each component of the spatial-temporal patterns. SCNN works with a pre-defined generative process of MTS, which arithmetically characterizes the latent structure of the spatial-temporal patterns. Extensive experiments are conducted to demonstrate that SCNN can achieve superior performance over state-of-the-art models on three real-world datasets.
arXiv Detail & Related papers (2023-05-22T13:39:44Z)
Multi-scale Evolutionary Neural Architecture Search for Deep Spiking Neural Networks [7.271032282434803]
We propose a Multi-Scale Evolutionary Neural Architecture Search (MSE-NAS) for Spiking Neural Networks (SNNs) MSE-NAS evolves individual neuron operation, self-organized integration of multiple circuit motifs, and global connectivity across motifs through a brain-inspired indirect evaluation function, Representational Dissimilarity Matrices (RDMs) The proposed algorithm achieves state-of-the-art (SOTA) performance with shorter simulation steps on static datasets and neuromorphic datasets.
arXiv Detail & Related papers (2023-04-21T05:36:37Z)
Split-Et-Impera: A Framework for the Design of Distributed Deep Learning Applications [8.434224141580758]
Split-Et-Impera determines the set of the best-split points of a neural network based on deep network interpretability principles. It performs a communication-aware simulation for the rapid evaluation of different neural network rearrangements. It suggests the best match between the quality of service requirements of the application and the performance in terms of accuracy and latency time.
arXiv Detail & Related papers (2023-03-22T13:00:00Z)
Batch-Ensemble Stochastic Neural Networks for Out-of-Distribution Detection [55.028065567756066]
Out-of-distribution (OOD) detection has recently received much attention from the machine learning community due to its importance in deploying machine learning models in real-world applications. In this paper we propose an uncertainty quantification approach by modelling the distribution of features. We incorporate an efficient ensemble mechanism, namely batch-ensemble, to construct the batch-ensemble neural networks (BE-SNNs) and overcome the feature collapse problem. We show that BE-SNNs yield superior performance on several OOD benchmarks, such as the Two-Moons dataset, the FashionMNIST vs MNIST dataset, FashionM
arXiv Detail & Related papers (2022-06-26T16:00:22Z)
Towards a General Purpose CNN for Long Range Dependencies in $\mathrm{N}$D [49.57261544331683]
We propose a single CNN architecture equipped with continuous convolutional kernels for tasks on arbitrary resolution, dimensionality and length without structural changes. We show the generality of our approach by applying the same CCNN to a wide set of tasks on sequential (1$mathrmD$) and visual data (2$mathrmD$) Our CCNN performs competitively and often outperforms the current state-of-the-art across all tasks considered.
arXiv Detail & Related papers (2022-06-07T15:48:02Z)
Deep Multi-Task Learning for Cooperative NOMA: System Design and Principles [52.79089414630366]
We develop a novel deep cooperative NOMA scheme, drawing upon the recent advances in deep learning (DL) We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.
arXiv Detail & Related papers (2020-07-27T12:38:37Z)
Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment. We implement this algorithm in a real-time robotic system with a microphone array. The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.