Compute and memory efficient universal sound source separation
- URL: http://arxiv.org/abs/2103.02644v1
- Date: Wed, 3 Mar 2021 19:16:53 GMT
- Title: Compute and memory efficient universal sound source separation
- Authors: Efthymios Tzinis, Zhepei Wang, Xilin Jiang and Paris Smaragdis
- Abstract summary: We provide a family of efficient neural network architectures for general purpose audio source separation.
The backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRM-RF)
Our experiments show that SuDoRM-RF models perform comparably and even surpass several state-of-the-art benchmarks.
- Score: 23.152611264259225
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent progress in audio source separation lead by deep learning has enabled
many neural network models to provide robust solutions to this fundamental
estimation problem. In this study, we provide a family of efficient neural
network architectures for general purpose audio source separation while
focusing on multiple computational aspects that hinder the application of
neural networks in real-world scenarios. The backbone structure of this
convolutional network is the SUccessive DOwnsampling and Resampling of
Multi-Resolution Features (SuDoRM-RF) as well as their aggregation which is
performed through simple one-dimensional convolutions. This mechanism enables
our models to obtain high fidelity signal separation in a wide variety of
settings where variable number of sources are present and with limited
computational resources (e.g. floating point operations, memory footprint,
number of parameters and latency). Our experiments show that SuDoRM-RF models
perform comparably and even surpass several state-of-the-art benchmarks with
significantly higher computational resource requirements. The causal variation
of SuDoRM-RF is able to obtain competitive performance in real-time speech
separation of around 10dB scale-invariant signal-to-distortion ratio
improvement (SI-SDRi) while remaining up to 20 times faster than real-time on a
laptop device.
Related papers
- Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
In neuromorphic computing, spiking neural networks (SNNs) perform inference tasks, offering significant efficiency gains for workloads involving sequential data.
Recent advances in hardware and software have demonstrated that embedding a few bits of payload in each spike exchanged between the spiking neurons can further enhance inference accuracy.
This paper investigates a wireless neuromorphic split computing architecture employing multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z) - Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning [50.74383395813782]
We propose a novel Frequency and Spatial Mutual Learning Network (FSMNet) to explore global dependencies across different modalities.
The proposed FSMNet achieves state-of-the-art performance for the Multi-Contrast MR Reconstruction task with different acceleration factors.
arXiv Detail & Related papers (2024-09-21T12:02:47Z) - ASMR: Activation-sharing Multi-resolution Coordinate Networks For Efficient Inference [6.005712471509875]
Coordinate network or implicit neural representation (INR) is a fast-emerging method for encoding natural signals.
We propose the Activation-Sharing Multi-Resolution (ASMR) coordinate network that combines multi-resolution coordinate decomposition with hierarchical modulations.
We show that ASMR can reduce the MAC of a vanilla SIREN model by up to 500x while achieving an even higher reconstruction quality than its SIREN baseline.
arXiv Detail & Related papers (2024-05-20T22:35:34Z) - Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution.
To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR.
Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - On Neural Architectures for Deep Learning-based Source Separation of
Co-Channel OFDM Signals [104.11663769306566]
We study the single-channel source separation problem involving frequency-division multiplexing (OFDM) signals.
We propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures.
arXiv Detail & Related papers (2023-03-11T16:29:13Z) - Neural Calibration for Scalable Beamforming in FDD Massive MIMO with
Implicit Channel Estimation [10.775558382613077]
Channel estimation and beamforming play critical roles in frequency-division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems.
We propose a deep learning-based approach that directly optimize the beamformers at the base station according to the received uplink pilots.
A neural calibration method is proposed to improve the scalability of the end-to-end design.
arXiv Detail & Related papers (2021-08-03T14:26:14Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z) - Sudo rm -rf: Efficient Networks for Universal Audio Source Separation [32.851407723043806]
We present an efficient neural network for end-to-end general purpose audio source separation.
The backbone structure of this network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRMRF)
arXiv Detail & Related papers (2020-07-14T05:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.