Upsampling layers for music source separation
- URL: http://arxiv.org/abs/2111.11773v1
- Date: Tue, 23 Nov 2021 10:36:28 GMT
- Title: Upsampling layers for music source separation
- Authors: Jordi Pons, Joan Serr\`a, Santiago Pascual, Giulio Cengarle, Daniel
Arteaga, Davide Scaini
- Abstract summary: Upsampling artifacts can either be tonal artifacts (additive high-frequency noise) or filtering artifacts (substractive, attenuating some bands)
We study how different artifacts interact and assess their impact on the models' performance.
Our results show that filtering artifacts, associated with upsamplers, are perceptually preferable, even if they tend to achieve worse objective scores.
- Score: 12.982998040587665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Upsampling artifacts are caused by problematic upsampling layers and due to
spectral replicas that emerge while upsampling. Also, depending on the used
upsampling layer, such artifacts can either be tonal artifacts (additive
high-frequency noise) or filtering artifacts (substractive, attenuating some
bands). In this work we investigate the practical implications of having
upsampling artifacts in the resulting audio, by studying how different
artifacts interact and assessing their impact on the models' performance. To
that end, we benchmark a large set of upsampling layers for music source
separation: different transposed and subpixel convolution setups, different
interpolation upsamplers (including two novel layers based on stretch and sinc
interpolation), and different wavelet-based upsamplers (including a novel
learnable wavelet layer). Our results show that filtering artifacts, associated
with interpolation upsamplers, are perceptually preferrable, even if they tend
to achieve worse objective scores.
Related papers
- When Semantic Segmentation Meets Frequency Aliasing [14.066404173580864]
We conduct a comprehensive analysis of hard pixel errors, categorizing them into three types: false responses, merging mistakes, and displacements.
Our findings reveal a quantitative association between hard pixels and aliasing, which is distortion caused by the overlapping of frequency components in the Fourier domain during downsampling.
Here, we propose two novel de-aliasing filter (DAF) and frequency mixing (FreqMix) modules to alleviate aliasing by accurately removing or adjusting frequencies higher than the Nyquist frequency.
arXiv Detail & Related papers (2024-03-14T03:12:02Z) - Improving Feature Stability during Upsampling -- Spectral Artifacts and the Importance of Spatial Context [15.351461000403074]
Pixel-wise predictions are required in a wide variety of tasks such as image restoration, image segmentation, or disparity estimation.
Previous works have shown that resampling operations are subject to artifacts such as aliasing.
We show that the availability of large spatial context during upsampling allows to provide stable, high-quality pixel-wise predictions.
arXiv Detail & Related papers (2023-11-29T10:53:05Z) - From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion [84.138804145918]
Deep generative models can generate high-fidelity audio conditioned on various types of representations.
These models are prone to generate audible artifacts when the conditioning is flawed or imperfect.
We propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality from low-bitrate discrete representations.
arXiv Detail & Related papers (2023-08-02T22:14:29Z) - BIMS-PU: Bi-Directional and Multi-Scale Point Cloud Upsampling [60.257912103351394]
We develop a new point cloud upsampling pipeline called BIMS-PU.
We decompose the up/downsampling procedure into several up/downsampling sub-steps by breaking the target sampling factor into smaller factors.
We show that our method achieves superior results to state-of-the-art approaches.
arXiv Detail & Related papers (2022-06-25T13:13:37Z) - Learning Spatio-Temporal Downsampling for Effective Video Upscaling [20.07194339353278]
In this paper, we aim to solve the space-time aliasing problem by learning a-temporal downsampling and upsampling.
Our framework enables a variety of applications, including arbitrary video resampling, blurry frame reconstruction, and efficient video storage.
arXiv Detail & Related papers (2022-03-15T17:59:00Z) - On the Frequency Bias of Generative Models [61.60834513380388]
We analyze proposed measures against high-frequency artifacts in state-of-the-art GAN training.
We find that none of the existing approaches can fully resolve spectral artifacts yet.
Our results suggest that there is great potential in improving the discriminator.
arXiv Detail & Related papers (2021-11-03T18:12:11Z) - Designing a Practical Degradation Model for Deep Blind Image
Super-Resolution [134.9023380383406]
Single image super-resolution (SISR) methods would not perform well if the assumed degradation model deviates from those in real images.
This paper proposes to design a more complex but practical degradation model that consists of randomly shuffled blur, downsampling and noise degradations.
arXiv Detail & Related papers (2021-03-25T17:40:53Z) - Learning Affinity-Aware Upsampling for Deep Image Matting [83.02806488958399]
We show that learning affinity in upsampling provides an effective and efficient approach to exploit pairwise interactions in deep networks.
In particular, results on the Composition-1k matting dataset show that A2U achieves a 14% relative improvement in the SAD metric against a strong baseline.
Compared with the state-of-the-art matting network, we achieve 8% higher performance with only 40% model complexity.
arXiv Detail & Related papers (2020-11-29T05:09:43Z) - Weakly- and Semi-Supervised Probabilistic Segmentation and
Quantification of Ultrasound Needle-Reverberation Artifacts to Allow Better
AI Understanding of Tissue Beneath Needles [0.0]
We propose a probabilistic needle-and-reverberation-artifact segmentation algorithm to separate desired tissue-based pixel values from superimposed artifacts.
Our method matches state-of-the-art artifact segmentation performance and sets a new standard in estimating the per-pixel contributions of artifact vs underlying anatomy.
arXiv Detail & Related papers (2020-11-24T08:34:38Z) - Upsampling artifacts in neural audio synthesis [24.409899861477427]
Upsampling artifacts have been studied in computer vision, but have been overlooked in audio processing.
Main sources of upsampling artifacts are: (i) the tonal and filtering artifacts introduced by problematic upsampling operators, and (ii) the spectral replicas that emerge while upsampling.
We show that nearest neighbor upsamplers can be an alternative to the problematic (but state-of-the-art) transposed and subpixel convolutions which are prone to introduce tonal artifacts.
arXiv Detail & Related papers (2020-10-27T15:09:28Z) - Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness.
The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.