Short-Time Fourier Transform for deblurring Variational Autoencoders
- URL: http://arxiv.org/abs/2401.03166v1
- Date: Sat, 6 Jan 2024 08:57:11 GMT
- Title: Short-Time Fourier Transform for deblurring Variational Autoencoders
- Authors: Vibhu Dalal
- Abstract summary: Variational Autoencoders (VAEs) are powerful generative models.
Their generated samples are known to suffer from a characteristic blurriness, as compared to the outputs of alternative generating techniques.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Variational Autoencoders (VAEs) are powerful generative models, however their
generated samples are known to suffer from a characteristic blurriness, as
compared to the outputs of alternative generating techniques. Extensive
research efforts have been made to tackle this problem, and several works have
focused on modifying the reconstruction term of the evidence lower bound
(ELBO). In particular, many have experimented with augmenting the
reconstruction loss with losses in the frequency domain. Such loss functions
usually employ the Fourier transform to explicitly penalise the lack of higher
frequency components in the generated samples, which are responsible for sharp
visual features. In this paper, we explore the aspects of previous such
approaches which aren't well understood, and we propose an augmentation to the
reconstruction term in response to them. Our reasoning leads us to use the
short-time Fourier transform and to emphasise on local phase coherence between
the input and output samples. We illustrate the potential of our proposed loss
on the MNIST dataset by providing both qualitative and quantitative results.
Related papers
- Frequency-Aware Deepfake Detection: Improving Generalizability through
Frequency Space Learning [81.98675881423131]
This research addresses the challenge of developing a universal deepfake detector that can effectively identify unseen deepfake images.
Existing frequency-based paradigms have relied on frequency-level artifacts introduced during the up-sampling in GAN pipelines to detect forgeries.
We introduce a novel frequency-aware approach called FreqNet, centered around frequency domain learning, specifically designed to enhance the generalizability of deepfake detectors.
arXiv Detail & Related papers (2024-03-12T01:28:00Z) - FFAD: A Novel Metric for Assessing Generated Time Series Data Utilizing
Fourier Transform and Auto-encoder [9.103662085683304]
The Fr'echet Inception Distance (FID) serves as the standard metric for evaluating generative models in image synthesis.
This work proposes a novel solution leveraging the Fourier transform and Auto-encoder, termed the Fr'echet Fourier-transform Auto-encoder Distance (FFAD)
Through our experimental results, we showcase the potential of FFAD for effectively distinguishing samples from different classes.
arXiv Detail & Related papers (2024-03-11T10:26:04Z) - Diagnostic Spatio-temporal Transformer with Faithful Encoding [54.02712048973161]
This paper addresses the task of anomaly diagnosis when the underlying data generation process has a complex-temporal (ST) dependency.
We formalize the problem as supervised dependency discovery, where the ST dependency is learned as a side product of time-series classification.
We show that temporal positional encoding used in existing ST transformer works has a serious limitation capturing frequencies in higher frequencies (short time scales)
We also propose a new ST dependency discovery framework, which can provide readily consumable diagnostic information in both spatial and temporal directions.
arXiv Detail & Related papers (2023-05-26T05:31:23Z) - Image Deblurring by Exploring In-depth Properties of Transformer [86.7039249037193]
We leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics.
By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information.
One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space.
arXiv Detail & Related papers (2023-03-24T14:14:25Z) - Robust representations of oil wells' intervals via sparse attention
mechanism [2.604557228169423]
We introduce the class of efficient Transformers named Regularized Transformers (Reguformers)
The focus in our experiments is on oil&gas data, namely, well logs.
To evaluate our models for such problems, we work with an industry-scale open dataset consisting of well logs of more than 20 wells.
arXiv Detail & Related papers (2022-12-29T09:56:33Z) - Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - Simpler is better: spectral regularization and up-sampling techniques
for variational autoencoders [1.2234742322758418]
characterization of the spectral behavior of generative models based on neural networks remains an open issue.
Recent research has focused heavily on generative adversarial networks and the high-frequency discrepancies between real and generated images.
We propose a simple 2D Fourier transform-based spectral regularization loss for the Variational Autoencoders (VAEs)
arXiv Detail & Related papers (2022-01-19T11:49:57Z) - A novel Time-frequency Transformer and its Application in Fault
Diagnosis of Rolling Bearings [0.24214594180459362]
We propose a novel time-frequency Transformer (TFT) model inspired by the massive success of standard Transformer in sequence processing.
A new end-to-end fault diagnosis framework based on TFT is presented in this paper.
arXiv Detail & Related papers (2021-04-19T06:53:31Z) - Focal Frequency Loss for Image Reconstruction and Synthesis [125.7135706352493]
We show that narrowing gaps in the frequency domain can ameliorate image reconstruction and synthesis quality further.
We propose a novel focal frequency loss, which allows a model to adaptively focus on frequency components that are hard to synthesize.
arXiv Detail & Related papers (2020-12-23T17:32:04Z) - On Long-Tailed Phenomena in Neural Machine Translation [50.65273145888896]
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens.
We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation.
We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy.
arXiv Detail & Related papers (2020-10-10T07:00:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.