Time-Domain Audio Source Separation Based on Wave-U-Net Combined with
Discrete Wavelet Transform
- URL: http://arxiv.org/abs/2001.10190v1
- Date: Tue, 28 Jan 2020 06:43:21 GMT
- Title: Time-Domain Audio Source Separation Based on Wave-U-Net Combined with
Discrete Wavelet Transform
- Authors: Tomohiko Nakamura and Hiroshi Saruwatari
- Abstract summary: We propose a time-domain audio source separation method based on a discrete wavelet transform (DWT)
The proposed method is based on one of the state-of-the-art deep neural networks, Wave-U-Net.
- Score: 34.05660769694652
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a time-domain audio source separation method using down-sampling
(DS) and up-sampling (US) layers based on a discrete wavelet transform (DWT).
The proposed method is based on one of the state-of-the-art deep neural
networks, Wave-U-Net, which successively down-samples and up-samples feature
maps. We find that this architecture resembles that of multiresolution
analysis, and reveal that the DS layers of Wave-U-Net cause aliasing and may
discard information useful for the separation. Although the effects of these
problems may be reduced by training, to achieve a more reliable source
separation method, we should design DS layers capable of overcoming the
problems. With this belief, focusing on the fact that the DWT has an
anti-aliasing filter and the perfect reconstruction property, we design the
proposed layers. Experiments on music source separation show the efficacy of
the proposed method and the importance of simultaneously considering the
anti-aliasing filters and the perfect reconstruction property.
Related papers
- Enhanced Wavelet Scattering Network for image inpainting detection [0.0]
This paper proposes several innovative ideas for detecting inpainting forgeries based on low level noise analysis.
It combines Dual-Tree Complex Wavelet Transform (DT-CWT) for feature extraction with convolutional neural networks (CNN) for forged area detection and localization.
Our approach was benchmarked against state-of-the-art methods and demonstrated superior performance over all cited alternatives.
arXiv Detail & Related papers (2024-09-25T15:27:05Z) - Spectral U-Net: Enhancing Medical Image Segmentation via Spectral Decomposition [14.450329809640422]
This paper introduces Spectral U-Net, a novel deep learning network based on spectral decomposition.
We exploit Dual Tree Complex Wavelet Transform (DTCWT) for down-sampling and inverse Dual Tree Complex Wavelet Transform (iDTCWT) for up-sampling.
We devise the corresponding Wave-Block and iWave-Block, integrated into the U-Net architecture, aiming at mitigating information loss during down-sampling and enhancing detail reconstruction during up-sampling.
arXiv Detail & Related papers (2024-09-13T22:10:14Z) - Ground-roll Separation From Land Seismic Records Based on Convolutional Neural Network [9.579207147600247]
Ground-roll wave is a common coherent noise in land field seismic data.
This paper proposes a novel way to separate ground-roll from reflections using convolutional neural network (CNN) model based method.
arXiv Detail & Related papers (2024-09-05T19:34:21Z) - DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image
Enhancement [77.0360085530701]
Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments.
Previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features.
Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space.
arXiv Detail & Related papers (2023-12-12T06:07:21Z) - Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models [89.76587063609806]
We study the denoising diffusion probabilistic model (DDPM) in wavelet space, instead of pixel space, for visual synthesis.
By explicitly modeling the wavelet signals, we find our model is able to generate images with higher quality on several datasets.
arXiv Detail & Related papers (2023-07-27T06:53:16Z) - Degradation-Noise-Aware Deep Unfolding Transformer for Hyperspectral
Image Denoising [9.119226249676501]
Hyperspectral images (HSIs) are often quite noisy because of narrow band spectral filtering.
To reduce the noise in HSI data cubes, both model-driven and learning-based denoising algorithms have been proposed.
This paper proposes a Degradation-Noise-Aware Unfolding Network (DNA-Net) that addresses these issues.
arXiv Detail & Related papers (2023-05-06T13:28:20Z) - Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering [84.37776381343662]
Mip-NeRF proposes a multiscale representation as a conical frustum to encode scale information.
We propose mip voxel grids (Mip-VoG), an explicit multiscale representation for real-time anti-aliasing rendering.
Our approach is the first to offer multiscale training and real-time anti-aliasing rendering simultaneously.
arXiv Detail & Related papers (2023-04-20T04:05:22Z) - On Neural Architectures for Deep Learning-based Source Separation of
Co-Channel OFDM Signals [104.11663769306566]
We study the single-channel source separation problem involving frequency-division multiplexing (OFDM) signals.
We propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures.
arXiv Detail & Related papers (2023-03-11T16:29:13Z) - Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness.
The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z) - MuCAN: Multi-Correspondence Aggregation Network for Video
Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame.
Inter- and intra-frames are the key sources for exploiting temporal and spatial information.
We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.