Related papers: LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation

LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation

URL: http://arxiv.org/abs/2010.11631v2
Date: Wed, 14 Apr 2021 05:31:12 GMT
Title: LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation
Authors: Woosung Choi and Minseok Kim and Jaehwa Chung and Soonyoung Jung
Abstract summary: We propose the Latent Source Attentive Frequency Transformation (LaSAFT) block to capture source-dependent frequency patterns. We also propose the Gated Point-wise Convolutional Modulation (GPoCM) to modulate internal features.
Score: 7.002478301291264
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent deep-learning approaches have shown that Frequency Transformation (FT) blocks can significantly improve spectrogram-based single-source separation models by capturing frequency patterns. The goal of this paper is to extend the FT block to fit the multi-source task. We propose the Latent Source Attentive Frequency Transformation (LaSAFT) block to capture source-dependent frequency patterns. We also propose the Gated Point-wise Convolutional Modulation (GPoCM), an extension of Feature-wise Linear Modulation (FiLM), to modulate internal features. By employing these two novel methods, we extend the Conditioned-U-Net (CUNet) for multi-source separation, and the experimental results indicate that our LaSAFT and GPoCM can improve the CUNet's performance, achieving state-of-the-art SDR performance on several MUSDB18 source separation tasks.

Related papers

Integrating Frequency Guidance into Multi-source Domain Generalization for Bearing Fault Diagnosis [24.85752780864944]
We propose the Fourier-based Augmentation Reconstruction Network, namely FARNet. The network comprises an amplitude spectrum sub-network and a phase spectrum sub-network, sequentially reducing the discrepancy between the source and target domains. To refine the decision boundary of our model output compared to conventional triplet loss, we propose a manifold triplet loss to contribute to generalization.
arXiv Detail & Related papers (2025-02-01T20:23:03Z)
Content-aware Balanced Spectrum Encoding in Masked Modeling for Time Series Classification [25.27495694566081]
We propose an auxiliary content-aware balanced decoder (CBD) to optimize the encoding quality in the spectrum space within masked modeling scheme. CBD iterates on a series of fundamental blocks, and thanks to two tailored units, each block could progressively refine the masked representation.
arXiv Detail & Related papers (2024-12-17T14:12:20Z)
MFF-FTNet: Multi-scale Feature Fusion across Frequency and Temporal Domains for Time Series Forecasting [18.815152183468673]
Time series forecasting is crucial in many fields, yet current deep learning models struggle with noise, data sparsity, and capturing complex patterns. This paper presents MFF-FTNet, a novel framework addressing these challenges by combining contrastive learning with multi-scale feature extraction. Extensive experiments on five real-world datasets demonstrate that MFF-FTNet significantly outperforms state-of-the-art models.
arXiv Detail & Related papers (2024-11-26T12:41:42Z)
Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning [50.74383395813782]
We propose a novel Frequency and Spatial Mutual Learning Network (FSMNet) to explore global dependencies across different modalities. The proposed FSMNet achieves state-of-the-art performance for the Multi-Contrast MR Reconstruction task with different acceleration factors.
arXiv Detail & Related papers (2024-09-21T12:02:47Z)
Score-based Source Separation with Applications to Digital Communication Signals [72.6570125649502]
We propose a new method for separating superimposed sources using diffusion-based generative models. Motivated by applications in radio-frequency (RF) systems, we are interested in sources with underlying discrete nature. Our method can be viewed as a multi-source extension to the recently proposed score distillation sampling scheme.
arXiv Detail & Related papers (2023-06-26T04:12:40Z)
Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time. This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z)
Adaptive Frequency Learning in Two-branch Face Forgery Detection [66.91715092251258]
We propose Adaptively learn Frequency information in the two-branch Detection framework, dubbed AFD. We liberate our network from the fixed frequency transforms, and achieve better performance with our data- and task-dependent transform layers.
arXiv Detail & Related papers (2022-03-27T14:25:52Z)
FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization [73.41395947275473]
We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain. Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
arXiv Detail & Related papers (2022-03-24T07:26:29Z)
Deep Frequency Filtering for Domain Generalization [55.66498461438285]
Deep Neural Networks (DNNs) have preferences for some frequency components in the learning process. We propose Deep Frequency Filtering (DFF) for learning domain-generalizable features. We show that applying our proposed DFF on a plain baseline outperforms the state-of-the-art methods on different domain generalization tasks.
arXiv Detail & Related papers (2022-03-23T05:19:06Z)
Compute and memory efficient universal sound source separation [23.152611264259225]
We provide a family of efficient neural network architectures for general purpose audio source separation. The backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRM-RF) Our experiments show that SuDoRM-RF models perform comparably and even surpass several state-of-the-art benchmarks.
arXiv Detail & Related papers (2021-03-03T19:16:53Z)
Sparse Multi-Family Deep Scattering Network [14.932318540666543]
We propose a novel architecture exploiting the interpretability of the Deep Scattering Network (DSN) The SMF-DSN enhances the DSN by increasing the diversity of the scattering coefficients and (ii) improves its robustness with respect to non-stationary noise.
arXiv Detail & Related papers (2020-12-14T16:06:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.