Music Source Separation with Band-split RNN
- URL: http://arxiv.org/abs/2209.15174v1
- Date: Fri, 30 Sep 2022 01:49:52 GMT
- Title: Music Source Separation with Band-split RNN
- Authors: Yi Luo, Jianwei Yu
- Abstract summary: We propose a frequency-domain model that splits the spectrogram of the mixture into subbands and perform interleaved band-level and sequence-level modeling.
The choices of the bandwidths of the subbands can be determined by a priori knowledge or expert knowledge on the characteristics of the target source.
Experiment results show that BSRNN trained only on MUSDB18-HQ dataset significantly outperforms several top-ranking models in Music Demixing (MDX) Challenge 2021.
- Score: 25.578400006180527
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The performance of music source separation (MSS) models has been greatly
improved in recent years thanks to the development of novel neural network
architectures and training pipelines. However, recent model designs for MSS
were mainly motivated by other audio processing tasks or other research fields,
while the intrinsic characteristics and patterns of the music signals were not
fully discovered. In this paper, we propose band-split RNN (BSRNN), a
frequency-domain model that explictly splits the spectrogram of the mixture
into subbands and perform interleaved band-level and sequence-level modeling.
The choices of the bandwidths of the subbands can be determined by a priori
knowledge or expert knowledge on the characteristics of the target source in
order to optimize the performance on a certain type of target musical
instrument. To better make use of unlabeled data, we also describe a
semi-supervised model finetuning pipeline that can further improve the
performance of the model. Experiment results show that BSRNN trained only on
MUSDB18-HQ dataset significantly outperforms several top-ranking models in
Music Demixing (MDX) Challenge 2021, and the semi-supervised finetuning stage
further improves the performance on all four instrument tracks.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual
Downstream Tasks [55.36987468073152]
This paper proposes a novel Dual-Guided Spatial-Channel-Temporal (DG-SCT) attention mechanism.
The DG-SCT module incorporates trainable cross-modal interaction layers into pre-trained audio-visual encoders.
Our proposed model achieves state-of-the-art results across multiple downstream tasks, including AVE, AVVP, AVS, and AVQA.
arXiv Detail & Related papers (2023-11-09T05:24:20Z) - MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice
Enhancement [8.782080886602145]
We propose a novel temporal-frequency neural network (MBTFNet) for singing voice enhancement.
MBTFNet removes background music, noise and even backing vocals from singing recordings.
Experiments show that our proposed model significantly outperforms several state-of-the-art SE and MSS models.
arXiv Detail & Related papers (2023-10-06T16:44:47Z) - Low-Resource Music Genre Classification with Cross-Modal Neural Model
Reprogramming [129.4950757742912]
We introduce a novel method for leveraging pre-trained models for low-resource (music) classification based on the concept of Neural Model Reprogramming (NMR)
NMR aims at re-purposing a pre-trained model from a source domain to a target domain by modifying the input of a frozen pre-trained model.
Experimental results suggest that a neural model pre-trained on large-scale datasets can successfully perform music genre classification by using this reprogramming method.
arXiv Detail & Related papers (2022-11-02T17:38:33Z) - Neural Waveshaping Synthesis [0.0]
We present a novel, lightweight, fully causal approach to neural audio synthesis.
The Neural Waveshaping Unit (NEWT) operates directly in the waveform domain.
It produces complex timbral evolutions by simple affine transformations of its input and output signals.
arXiv Detail & Related papers (2021-07-11T13:50:59Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated.
We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z) - Exploring Quality and Generalizability in Parameterized Neural Audio
Effects [0.0]
Deep neural networks have shown promise for music audio signal processing applications.
Results to date have tended to be constrained by low sample rates, noise, narrow domains of signal types, and/or lack of parameterized controls.
This work expands on prior research published on modeling nonlinear time-dependent signal processing effects.
arXiv Detail & Related papers (2020-06-10T00:52:08Z) - RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and
Solutions [73.45995446500312]
We analyze the generalization properties of streaming and non-streaming recurrent neural network transducer (RNN-T) based end-to-end models.
We propose two solutions: combining multiple regularization techniques during training, and using dynamic overlapping inference.
arXiv Detail & Related papers (2020-05-07T06:24:47Z) - Modeling Musical Structure with Artificial Neural Networks [0.0]
I explore the application of artificial neural networks to different aspects of musical structure modeling.
I show how a connectionist model, the Gated Autoencoder (GAE), can be employed to learn transformations between musical fragments.
I propose a special predictive training of the GAE, which yields a representation of polyphonic music as a sequence of intervals.
arXiv Detail & Related papers (2020-01-06T18:35:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.