Sudo rm -rf: Efficient Networks for Universal Audio Source Separation
- URL: http://arxiv.org/abs/2007.06833v1
- Date: Tue, 14 Jul 2020 05:46:38 GMT
- Title: Sudo rm -rf: Efficient Networks for Universal Audio Source Separation
- Authors: Efthymios Tzinis, Zhepei Wang and Paris Smaragdis
- Abstract summary: We present an efficient neural network for end-to-end general purpose audio source separation.
The backbone structure of this network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRMRF)
- Score: 32.851407723043806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present an efficient neural network for end-to-end general
purpose audio source separation. Specifically, the backbone structure of this
convolutional network is the SUccessive DOwnsampling and Resampling of
Multi-Resolution Features (SuDoRMRF) as well as their aggregation which is
performed through simple one-dimensional convolutions. In this way, we are able
to obtain high quality audio source separation with limited number of floating
point operations, memory requirements, number of parameters and latency. Our
experiments on both speech and environmental sound separation datasets show
that SuDoRMRF performs comparably and even surpasses various state-of-the-art
approaches with significantly higher computational resource requirements.
Related papers
- Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning [50.74383395813782]
We propose a novel Frequency and Spatial Mutual Learning Network (FSMNet) to explore global dependencies across different modalities.
The proposed FSMNet achieves state-of-the-art performance for the Multi-Contrast MR Reconstruction task with different acceleration factors.
arXiv Detail & Related papers (2024-09-21T12:02:47Z) - ASMR: Activation-sharing Multi-resolution Coordinate Networks For Efficient Inference [6.005712471509875]
Coordinate network or implicit neural representation (INR) is a fast-emerging method for encoding natural signals.
We propose the Activation-Sharing Multi-Resolution (ASMR) coordinate network that combines multi-resolution coordinate decomposition with hierarchical modulations.
We show that ASMR can reduce the MAC of a vanilla SIREN model by up to 500x while achieving an even higher reconstruction quality than its SIREN baseline.
arXiv Detail & Related papers (2024-05-20T22:35:34Z) - Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution.
To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR.
Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z) - RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech Separation [18.93255531121519]
We present a novel time-frequency domain audio-visual speech separation method.
RTFS-Net applies its algorithms on the complex time-frequency bins yielded by the Short-Time Fourier Transform.
This is the first time-frequency domain audio-visual speech separation method to outperform all contemporary time-domain counterparts.
arXiv Detail & Related papers (2023-09-29T12:38:00Z) - A Generalized Bandsplit Neural Network for Cinematic Audio Source
Separation [39.45425155123186]
We develop a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis.
A loss function motivated by the signal-to-noise ratio and the sparsity-promoting property of the 1-norm was proposed.
Our best model sets the state of the art on the Divide and Remaster dataset with performance above the ideal ratio mask for the dialogue stem.
arXiv Detail & Related papers (2023-09-05T19:19:22Z) - On Neural Architectures for Deep Learning-based Source Separation of
Co-Channel OFDM Signals [104.11663769306566]
We study the single-channel source separation problem involving frequency-division multiplexing (OFDM) signals.
We propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures.
arXiv Detail & Related papers (2023-03-11T16:29:13Z) - Neural Vocoder is All You Need for Speech Super-resolution [56.84715616516612]
Speech super-resolution (SR) is a task to increase speech sampling rate by generating high-frequency components.
Existing speech SR methods are trained in constrained experimental settings, such as a fixed upsampling ratio.
We propose a neural vocoder based speech super-resolution method (NVSR) that can handle a variety of input resolution and upsampling ratios.
arXiv Detail & Related papers (2022-03-28T17:51:00Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z) - Compute and memory efficient universal sound source separation [23.152611264259225]
We provide a family of efficient neural network architectures for general purpose audio source separation.
The backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRM-RF)
Our experiments show that SuDoRM-RF models perform comparably and even surpass several state-of-the-art benchmarks.
arXiv Detail & Related papers (2021-03-03T19:16:53Z) - Conditioning Trick for Training Stable GANs [70.15099665710336]
We propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training.
We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition.
arXiv Detail & Related papers (2020-10-12T16:50:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.