LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned
Source Separation
- URL: http://arxiv.org/abs/2010.11631v2
- Date: Wed, 14 Apr 2021 05:31:12 GMT
- Title: LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned
Source Separation
- Authors: Woosung Choi and Minseok Kim and Jaehwa Chung and Soonyoung Jung
- Abstract summary: We propose the Latent Source Attentive Frequency Transformation (LaSAFT) block to capture source-dependent frequency patterns.
We also propose the Gated Point-wise Convolutional Modulation (GPoCM) to modulate internal features.
- Score: 7.002478301291264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent deep-learning approaches have shown that Frequency Transformation (FT)
blocks can significantly improve spectrogram-based single-source separation
models by capturing frequency patterns. The goal of this paper is to extend the
FT block to fit the multi-source task. We propose the Latent Source Attentive
Frequency Transformation (LaSAFT) block to capture source-dependent frequency
patterns. We also propose the Gated Point-wise Convolutional Modulation
(GPoCM), an extension of Feature-wise Linear Modulation (FiLM), to modulate
internal features. By employing these two novel methods, we extend the
Conditioned-U-Net (CUNet) for multi-source separation, and the experimental
results indicate that our LaSAFT and GPoCM can improve the CUNet's performance,
achieving state-of-the-art SDR performance on several MUSDB18 source separation
tasks.
Related papers
- Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning [50.74383395813782]
We propose a novel Frequency and Spatial Mutual Learning Network (FSMNet) to explore global dependencies across different modalities.
The proposed FSMNet achieves state-of-the-art performance for the Multi-Contrast MR Reconstruction task with different acceleration factors.
arXiv Detail & Related papers (2024-09-21T12:02:47Z) - FANet: Feature Amplification Network for Semantic Segmentation in Cluttered Background [9.970265640589966]
Existing deep learning approaches leave out the semantic cues that are crucial in semantic segmentation present in complex scenarios.
We propose a feature amplification network (FANet) as a backbone network that incorporates semantic information using a novel feature enhancement module at multi-stages.
Our experimental results demonstrate the state-of-the-art performance compared to existing methods.
arXiv Detail & Related papers (2024-07-12T15:57:52Z) - Score-based Source Separation with Applications to Digital Communication
Signals [72.6570125649502]
We propose a new method for separating superimposed sources using diffusion-based generative models.
Motivated by applications in radio-frequency (RF) systems, we are interested in sources with underlying discrete nature.
Our method can be viewed as a multi-source extension to the recently proposed score distillation sampling scheme.
arXiv Detail & Related papers (2023-06-26T04:12:40Z) - Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - Adaptive Frequency Learning in Two-branch Face Forgery Detection [66.91715092251258]
We propose Adaptively learn Frequency information in the two-branch Detection framework, dubbed AFD.
We liberate our network from the fixed frequency transforms, and achieve better performance with our data- and task-dependent transform layers.
arXiv Detail & Related papers (2022-03-27T14:25:52Z) - FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization [73.41395947275473]
We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain.
Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
arXiv Detail & Related papers (2022-03-24T07:26:29Z) - Deep Frequency Filtering for Domain Generalization [55.66498461438285]
Deep Neural Networks (DNNs) have preferences for some frequency components in the learning process.
We propose Deep Frequency Filtering (DFF) for learning domain-generalizable features.
We show that applying our proposed DFF on a plain baseline outperforms the state-of-the-art methods on different domain generalization tasks.
arXiv Detail & Related papers (2022-03-23T05:19:06Z) - Compute and memory efficient universal sound source separation [23.152611264259225]
We provide a family of efficient neural network architectures for general purpose audio source separation.
The backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRM-RF)
Our experiments show that SuDoRM-RF models perform comparably and even surpass several state-of-the-art benchmarks.
arXiv Detail & Related papers (2021-03-03T19:16:53Z) - Sparse Multi-Family Deep Scattering Network [14.932318540666543]
We propose a novel architecture exploiting the interpretability of the Deep Scattering Network (DSN)
The SMF-DSN enhances the DSN by increasing the diversity of the scattering coefficients and (ii) improves its robustness with respect to non-stationary noise.
arXiv Detail & Related papers (2020-12-14T16:06:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.