Time-Domain Audio Source Separation Based on Wave-U-Net Combined with
Discrete Wavelet Transform
- URL: http://arxiv.org/abs/2001.10190v1
- Date: Tue, 28 Jan 2020 06:43:21 GMT
- Title: Time-Domain Audio Source Separation Based on Wave-U-Net Combined with
Discrete Wavelet Transform
- Authors: Tomohiko Nakamura and Hiroshi Saruwatari
- Abstract summary: We propose a time-domain audio source separation method based on a discrete wavelet transform (DWT)
The proposed method is based on one of the state-of-the-art deep neural networks, Wave-U-Net.
- Score: 34.05660769694652
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a time-domain audio source separation method using down-sampling
(DS) and up-sampling (US) layers based on a discrete wavelet transform (DWT).
The proposed method is based on one of the state-of-the-art deep neural
networks, Wave-U-Net, which successively down-samples and up-samples feature
maps. We find that this architecture resembles that of multiresolution
analysis, and reveal that the DS layers of Wave-U-Net cause aliasing and may
discard information useful for the separation. Although the effects of these
problems may be reduced by training, to achieve a more reliable source
separation method, we should design DS layers capable of overcoming the
problems. With this belief, focusing on the fact that the DWT has an
anti-aliasing filter and the perfect reconstruction property, we design the
proposed layers. Experiments on music source separation show the efficacy of
the proposed method and the importance of simultaneously considering the
anti-aliasing filters and the perfect reconstruction property.
Related papers
- Data-Driven Room Acoustic Modeling Via Differentiable Feedback Delay Networks With Learnable Delay Lines [46.2770645198924]
We introduce a novel method for finding the parameters of a Feedback Delay Network (FDN)
The proposed approach involves the implementation of a differentiable FDN with trainable delay lines.
We show that the proposed method yields time-invariant frequency-independent FDNs capable of closely matching the desired acoustical characteristics.
arXiv Detail & Related papers (2024-03-29T10:48:32Z) - DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image
Enhancement [77.0360085530701]
Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments.
Previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features.
Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space.
arXiv Detail & Related papers (2023-12-12T06:07:21Z) - Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models [89.76587063609806]
We study the denoising diffusion probabilistic model (DDPM) in wavelet space, instead of pixel space, for visual synthesis.
By explicitly modeling the wavelet signals, we find our model is able to generate images with higher quality on several datasets.
arXiv Detail & Related papers (2023-07-27T06:53:16Z) - Degradation-Noise-Aware Deep Unfolding Transformer for Hyperspectral
Image Denoising [9.119226249676501]
Hyperspectral images (HSIs) are often quite noisy because of narrow band spectral filtering.
To reduce the noise in HSI data cubes, both model-driven and learning-based denoising algorithms have been proposed.
This paper proposes a Degradation-Noise-Aware Unfolding Network (DNA-Net) that addresses these issues.
arXiv Detail & Related papers (2023-05-06T13:28:20Z) - Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering [84.37776381343662]
Mip-NeRF proposes a multiscale representation as a conical frustum to encode scale information.
We propose mip voxel grids (Mip-VoG), an explicit multiscale representation for real-time anti-aliasing rendering.
Our approach is the first to offer multiscale training and real-time anti-aliasing rendering simultaneously.
arXiv Detail & Related papers (2023-04-20T04:05:22Z) - On Neural Architectures for Deep Learning-based Source Separation of
Co-Channel OFDM Signals [104.11663769306566]
We study the single-channel source separation problem involving frequency-division multiplexing (OFDM) signals.
We propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures.
arXiv Detail & Related papers (2023-03-11T16:29:13Z) - A hybrid approach to seismic deblending: when physics meets
self-supervision [0.0]
We introduce a new concept that consists of embedding a self-supervised denoising network into the Plug-and-Play framework.
A novel network is introduced whose design extends the blind-spot network architecture of [28 ] for partially correlated noise.
The network is then trained directly on the noisy input data at each step of the supervised time algorithm.
arXiv Detail & Related papers (2022-05-30T19:24:21Z) - Subpixel object segmentation using wavelets and multi resolution
analysis [4.970364068620608]
We propose a novel deep learning framework for fast prediction of boundaries of two-dimensional simply connected domains.
The boundaries are modelled as (piecewise) smooth closed curves using wavelets and the so-called Pyramid Algorithm.
Our model demonstrates up to 5x faster inference speed compared to the U-Net, while maintaining similar performance in terms of Dice score and Hausdorff distance.
arXiv Detail & Related papers (2021-10-28T15:43:21Z) - Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness.
The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z) - MuCAN: Multi-Correspondence Aggregation Network for Video
Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame.
Inter- and intra-frames are the key sources for exploiting temporal and spatial information.
We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.