Related papers: Enhancing variational generation through self-decomposition

Enhancing variational generation through self-decomposition

URL: http://arxiv.org/abs/2202.02738v1
Date: Sun, 6 Feb 2022 08:49:21 GMT
Title: Enhancing variational generation through self-decomposition
Authors: Andrea Asperti, Laura Bugo, Daniele Filippini
Abstract summary: We introduce the notion of Split Variational Autoencoder (SVAE) The network is trained as a usual Variational Autoencoder with a negative loglikelihood loss between training and reconstructed images. According to the FID metric, our technique, tested on typical datasets such as Mnist, Cifar10 and Celeba, allows us to outperform all previous purely variational architectures.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this article we introduce the notion of Split Variational Autoencoder (SVAE), whose output $\hat{x}$ is obtained as a weighted sum $\sigma \odot \hat{x_1} + (1-\sigma) \odot \hat{x_2}$ of two generated images $\hat{x_1},\hat{x_2}$, and $\sigma$ is a learned compositional map. The network is trained as a usual Variational Autoencoder with a negative loglikelihood loss between training and reconstructed images. The decomposition is nondeterministic, but follows two main schemes, that we may roughly categorize as either "syntactic" or "semantic". In the first case, the map tends to exploit the strong correlation between adjacent pixels, splitting the image in two complementary high frequency sub-images. In the second case, the map typically focuses on the contours of objects, splitting the image in interesting variations of its content, with more marked and distinctive features. In this case, the Fr\'echet Inception Distance (FID) of $\hat{x_1}$ and $\hat{x_2}$ is usually lower (hence better) than that of $\hat{x}$, that clearly suffers from being the average of the formers. In a sense, a SVAE forces the Variational Autoencoder to {\em make choices}, in contrast with its intrinsic tendency to average between alternatives with the aim to minimize the reconstruction loss towards a specific sample. According to the FID metric, our technique, tested on typical datasets such as Mnist, Cifar10 and Celeba, allows us to outperform all previous purely variational architectures (not relying on normalization flows).

Related papers

Normalized Matching Transformer [17.007956857855806]
We present a new state of the art approach for sparse keypoint matching between pairs of images. Our method consists of a fully deep learning based approach combined with a SplineCNN graph neural network for feature processing and a normalized transformer decoder for decoding keypoint correspondences together with the Sinkhorn algorithm.
arXiv Detail & Related papers (2025-03-22T10:09:11Z)
RefineStyle: Dynamic Convolution Refinement for StyleGAN [15.230430037135017]
In StyleGAN, convolution kernels are shaped by both static parameters shared across images. $mathcalW+$ space is often used for image inversion and editing. This paper proposes an efficient refining strategy for dynamic kernels.
arXiv Detail & Related papers (2024-10-08T15:01:30Z)
High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation [17.804090651425955]
Image-level weakly-supervised segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training. Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss. We reformulate both techniques based on binomial posteriors of multiple independent binary problems. This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method.
arXiv Detail & Related papers (2023-04-05T17:43:57Z)
EGC: Image Generation and Classification via a Diffusion Energy-Based Model [59.591755258395594]
This work introduces an energy-based classifier and generator, namely EGC, which can achieve superior performance in both tasks using a single neural network. EGC achieves competitive generation results compared with state-of-the-art approaches on ImageNet-1k, CelebA-HQ and LSUN Church. This work represents the first successful attempt to simultaneously excel in both tasks using a single set of network parameters.
arXiv Detail & Related papers (2023-04-04T17:59:14Z)
I$^2$SB: Image-to-Image Schr\"odinger Bridge [87.43524087956457]
Image-to-Image Schr"odinger Bridge (I$2$SB) is a new class of conditional diffusion models. I$2$SB directly learns the nonlinear diffusion processes between two given distributions. We show that I$2$SB surpasses standard conditional diffusion models with more interpretable generative processes.
arXiv Detail & Related papers (2023-02-12T08:35:39Z)
Rethinking the Paradigm of Content Constraints in Unpaired Image-to-Image Translation [9.900050049833986]
We propose EnCo, a simple but efficient way to maintain the content by constraining the representational similarity in the latent space of patch-level features. For the similarity function, we use a simple MSE loss instead of contrastive loss, which is currently widely used in I2I tasks. In addition, we rethink the role played by discriminators in sampling patches and propose a discnative attention-guided (DAG) patch sampling strategy to replace random sampling.
arXiv Detail & Related papers (2022-11-20T04:39:57Z)
$\texttt{GradICON}$: Approximate Diffeomorphisms via Gradient Inverse Consistency [16.72466200341455]
We use a neural network to predict a map between a source and a target image as well as the map when swapping the source and target images. We achieve state-of-the-art registration performance on a variety of real-world medical image datasets.
arXiv Detail & Related papers (2022-06-13T04:03:49Z)
Learning a Weight Map for Weakly-Supervised Localization [93.91375268580806]
We train a generative network $g$ that outputs, given the input image, a per-pixel weight map that indicates the location of the object within the image. Our results indicate that the method outperforms existing localization methods by a sizable margin on the challenging fine-grained classification datasets.
arXiv Detail & Related papers (2021-11-28T12:45:23Z)
Seed the Views: Hierarchical Semantic Alignment for Contrastive Representation Learning [116.91819311885166]
We propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to textbfCross-samples and Multi-level representation. Our method, termed as CsMl, has the ability to integrate multi-level visual representations across samples in a robust way.
arXiv Detail & Related papers (2020-12-04T17:26:24Z)
Permuted AdaIN: Reducing the Bias Towards Global Statistics in Image Classification [97.81205777897043]
Recent work has shown that convolutional neural network classifiers overly rely on texture at the expense of shape cues. We make a similar but different distinction between shape and local image cues, on the one hand, and global image statistics, on the other. Our method, called Permuted Adaptive Instance Normalization (pAdaIN), reduces the representation of global statistics in the hidden layers of image classifiers.
arXiv Detail & Related papers (2020-10-09T16:38:38Z)
Fast Nonconvex $T_2^*$ Mapping Using ADMM [14.22930572798757]
Magnetic resonance (MR)$T*$ mapping is widely used to study hemorrhage, calcification and iron deposition in various clinical applications, it provides a direct and precise mapping of desired contrast in tissue. The long acquisition time required by conventional 3D-resolution $*$ mapping method causes discomfort to patients and introduces motion artifacts to reconstructed images, which limits its wider applicability. In this paper we address this issue by performing $T*$ mapping from undersampled data using compressive sensing.
arXiv Detail & Related papers (2020-08-04T20:08:43Z)
Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields. To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss. We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.