Upsampling artifacts in neural audio synthesis
- URL: http://arxiv.org/abs/2010.14356v2
- Date: Tue, 9 Feb 2021 17:21:05 GMT
- Title: Upsampling artifacts in neural audio synthesis
- Authors: Jordi Pons, Santiago Pascual, Giulio Cengarle, Joan Serr\`a
- Abstract summary: Upsampling artifacts have been studied in computer vision, but have been overlooked in audio processing.
Main sources of upsampling artifacts are: (i) the tonal and filtering artifacts introduced by problematic upsampling operators, and (ii) the spectral replicas that emerge while upsampling.
We show that nearest neighbor upsamplers can be an alternative to the problematic (but state-of-the-art) transposed and subpixel convolutions which are prone to introduce tonal artifacts.
- Score: 24.409899861477427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A number of recent advances in neural audio synthesis rely on upsampling
layers, which can introduce undesired artifacts. In computer vision, upsampling
artifacts have been studied and are known as checkerboard artifacts (due to
their characteristic visual pattern). However, their effect has been overlooked
so far in audio processing. Here, we address this gap by studying this problem
from the audio signal processing perspective. We first show that the main
sources of upsampling artifacts are: (i) the tonal and filtering artifacts
introduced by problematic upsampling operators, and (ii) the spectral replicas
that emerge while upsampling. We then compare different upsampling layers,
showing that nearest neighbor upsamplers can be an alternative to the
problematic (but state-of-the-art) transposed and subpixel convolutions which
are prone to introduce tonal artifacts.
Related papers
- Analyzing the Impact of Splicing Artifacts in Partially Fake Speech Signals [15.595136769477614]
We analyze spliced audio tracks resulting from signal concatenation, investigate their artifacts and assess whether such artifacts introduce any bias in existing datasets.
Our findings reveal that by analyzing splicing artifacts, we can achieve a detection EER of 6.16% and 7.36% on PartialSpoof and HAD datasets, respectively.
arXiv Detail & Related papers (2024-08-25T09:28:04Z) - The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise [92.53724347718173]
Diffusion models have achieved remarkable success in text-to-image generation tasks.
We identify specific regions within the initial noise image, termed trigger patches, that play a key role for object generation in the resulting images.
arXiv Detail & Related papers (2024-06-04T05:06:00Z) - Rethinking the Up-Sampling Operations in CNN-based Generative Network
for Generalizable Deepfake Detection [86.97062579515833]
We introduce the concept of Neighboring Pixel Relationships(NPR) as a means to capture and characterize the generalized structural artifacts stemming from up-sampling operations.
A comprehensive analysis is conducted on an open-world dataset, comprising samples generated by tft28 distinct generative models.
This analysis culminates in the establishment of a novel state-of-the-art performance, showcasing a remarkable tft11.6% improvement over existing methods.
arXiv Detail & Related papers (2023-12-16T14:27:06Z) - Improving Feature Stability during Upsampling -- Spectral Artifacts and the Importance of Spatial Context [15.351461000403074]
Pixel-wise predictions are required in a wide variety of tasks such as image restoration, image segmentation, or disparity estimation.
Previous works have shown that resampling operations are subject to artifacts such as aliasing.
We show that the availability of large spatial context during upsampling allows to provide stable, high-quality pixel-wise predictions.
arXiv Detail & Related papers (2023-11-29T10:53:05Z) - Upsampling layers for music source separation [12.982998040587665]
Upsampling artifacts can either be tonal artifacts (additive high-frequency noise) or filtering artifacts (substractive, attenuating some bands)
We study how different artifacts interact and assess their impact on the models' performance.
Our results show that filtering artifacts, associated with upsamplers, are perceptually preferable, even if they tend to achieve worse objective scores.
arXiv Detail & Related papers (2021-11-23T10:36:28Z) - On the Frequency Bias of Generative Models [61.60834513380388]
We analyze proposed measures against high-frequency artifacts in state-of-the-art GAN training.
We find that none of the existing approaches can fully resolve spectral artifacts yet.
Our results suggest that there is great potential in improving the discriminator.
arXiv Detail & Related papers (2021-11-03T18:12:11Z) - Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose.
Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification.
We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z) - AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [55.24336227884039]
We present a novel framework to generate high-fidelity talking head video.
We use neural scene representation networks to bridge the gap between audio input and video output.
Our framework can (1) produce high-fidelity and natural results, and (2) support free adjustment of audio signals, viewing directions, and background images.
arXiv Detail & Related papers (2021-03-20T02:58:13Z) - Weakly- and Semi-Supervised Probabilistic Segmentation and
Quantification of Ultrasound Needle-Reverberation Artifacts to Allow Better
AI Understanding of Tissue Beneath Needles [0.0]
We propose a probabilistic needle-and-reverberation-artifact segmentation algorithm to separate desired tissue-based pixel values from superimposed artifacts.
Our method matches state-of-the-art artifact segmentation performance and sets a new standard in estimating the per-pixel contributions of artifact vs underlying anatomy.
arXiv Detail & Related papers (2020-11-24T08:34:38Z) - BBAND Index: A No-Reference Banding Artifact Predictor [55.42929350861115]
Banding artifact, or false contouring, is a common video compression impairment.
We propose a new distortion-specific no-reference video quality model for predicting banding artifacts, called the Blind BANding Detector (BBAND index)
arXiv Detail & Related papers (2020-02-27T03:05:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.