Related papers: Time-Domain Audio Source Separation Based on Wave-U-Net Combined with Discrete Wavelet Transform

Time-Domain Audio Source Separation Based on Wave-U-Net Combined with Discrete Wavelet Transform

URL: http://arxiv.org/abs/2001.10190v1
Date: Tue, 28 Jan 2020 06:43:21 GMT
Title: Time-Domain Audio Source Separation Based on Wave-U-Net Combined with Discrete Wavelet Transform
Authors: Tomohiko Nakamura and Hiroshi Saruwatari
Abstract summary: We propose a time-domain audio source separation method based on a discrete wavelet transform (DWT) The proposed method is based on one of the state-of-the-art deep neural networks, Wave-U-Net.
Score: 34.05660769694652
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We propose a time-domain audio source separation method using down-sampling (DS) and up-sampling (US) layers based on a discrete wavelet transform (DWT). The proposed method is based on one of the state-of-the-art deep neural networks, Wave-U-Net, which successively down-samples and up-samples feature maps. We find that this architecture resembles that of multiresolution analysis, and reveal that the DS layers of Wave-U-Net cause aliasing and may discard information useful for the separation. Although the effects of these problems may be reduced by training, to achieve a more reliable source separation method, we should design DS layers capable of overcoming the problems. With this belief, focusing on the fact that the DWT has an anti-aliasing filter and the perfect reconstruction property, we design the proposed layers. Experiments on music source separation show the efficacy of the proposed method and the importance of simultaneously considering the anti-aliasing filters and the perfect reconstruction property.

Related papers

Diffusion Models for Solving Inverse Problems via Posterior Sampling with Piecewise Guidance [52.705112811734566]
A novel diffusion-based framework is introduced for solving inverse problems using a piecewise guidance scheme.<n>The proposed method is problem-agnostic and readily adaptable to a variety of inverse problems.<n>The framework achieves a reduction in inference time of (25%) for inpainting with both random and center masks, and (23%) and (24%) for (4times) and (8times) super-resolution tasks.
arXiv Detail & Related papers (2025-07-22T19:35:14Z)
Adaptive Control Attention Network for Underwater Acoustic Localization and Domain Adaptation [8.017203108408973]
Localizing acoustic sound sources in the ocean is a challenging task due to the complex and dynamic nature of the environment.<n>We propose a multi-branch network architecture designed to accurately predict the distance between a moving acoustic source and a receiver.<n>Our proposed method outperforms state-of-the-art (SOTA) approaches in similar settings.
arXiv Detail & Related papers (2025-06-20T18:13:30Z)
Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition [83.40450475728792]
We present Freqformer, a Transformer-based framework specifically designed for image demoir'eing through targeted frequency separation.<n>Our method performs an effective frequency decomposition that explicitly splits moir'e patterns into high-frequency spatially-localized textures and low-frequency scale-robust color distortions.<n>Experiments on various demoir'eing benchmarks demonstrate that Freqformer achieves state-of-the-art performance with a compact model size.
arXiv Detail & Related papers (2025-05-25T12:23:10Z)
Enhanced Wavelet Scattering Network for image inpainting detection [0.0]
This paper proposes several innovative ideas for detecting inpainting forgeries based on low level noise analysis. It combines Dual-Tree Complex Wavelet Transform (DT-CWT) for feature extraction with convolutional neural networks (CNN) for forged area detection and localization. Our approach was benchmarked against state-of-the-art methods and demonstrated superior performance over all cited alternatives.
arXiv Detail & Related papers (2024-09-25T15:27:05Z)
Spectral U-Net: Enhancing Medical Image Segmentation via Spectral Decomposition [14.450329809640422]
This paper introduces Spectral U-Net, a novel deep learning network based on spectral decomposition. We exploit Dual Tree Complex Wavelet Transform (DTCWT) for down-sampling and inverse Dual Tree Complex Wavelet Transform (iDTCWT) for up-sampling. We devise the corresponding Wave-Block and iWave-Block, integrated into the U-Net architecture, aiming at mitigating information loss during down-sampling and enhancing detail reconstruction during up-sampling.
arXiv Detail & Related papers (2024-09-13T22:10:14Z)
Ground-roll Separation From Land Seismic Records Based on Convolutional Neural Network [9.579207147600247]
Ground-roll wave is a common coherent noise in land field seismic data. This paper proposes a novel way to separate ground-roll from reflections using convolutional neural network (CNN) model based method.
arXiv Detail & Related papers (2024-09-05T19:34:21Z)
DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image Enhancement [77.0360085530701]
Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments. Previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features. Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space.
arXiv Detail & Related papers (2023-12-12T06:07:21Z)
Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models [89.76587063609806]
We study the denoising diffusion probabilistic model (DDPM) in wavelet space, instead of pixel space, for visual synthesis. By explicitly modeling the wavelet signals, we find our model is able to generate images with higher quality on several datasets.
arXiv Detail & Related papers (2023-07-27T06:53:16Z)
Degradation-Noise-Aware Deep Unfolding Transformer for Hyperspectral Image Denoising [9.119226249676501]
Hyperspectral images (HSIs) are often quite noisy because of narrow band spectral filtering. To reduce the noise in HSI data cubes, both model-driven and learning-based denoising algorithms have been proposed. This paper proposes a Degradation-Noise-Aware Unfolding Network (DNA-Net) that addresses these issues.
arXiv Detail & Related papers (2023-05-06T13:28:20Z)
Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering [84.37776381343662]
Mip-NeRF proposes a multiscale representation as a conical frustum to encode scale information. We propose mip voxel grids (Mip-VoG), an explicit multiscale representation for real-time anti-aliasing rendering. Our approach is the first to offer multiscale training and real-time anti-aliasing rendering simultaneously.
arXiv Detail & Related papers (2023-04-20T04:05:22Z)
On Neural Architectures for Deep Learning-based Source Separation of Co-Channel OFDM Signals [104.11663769306566]
We study the single-channel source separation problem involving frequency-division multiplexing (OFDM) signals. We propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures.
arXiv Detail & Related papers (2023-03-11T16:29:13Z)
Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness. The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z)
MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame. Inter- and intra-frames are the key sources for exploiting temporal and spatial information. We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.