Style-Guided Inference of Transformer for High-resolution Image
Synthesis
- URL: http://arxiv.org/abs/2210.05533v1
- Date: Tue, 11 Oct 2022 15:21:20 GMT
- Title: Style-Guided Inference of Transformer for High-resolution Image
Synthesis
- Authors: Jonghwa Yim, Minjae Kim
- Abstract summary: Transformer is eminently suitable for auto-regressive image synthesis.
In this article, we propose to take a desired output, a style image, as an additional condition without re-training the transformer.
- Score: 4.974890682815778
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer is eminently suitable for auto-regressive image synthesis which
predicts discrete value from the past values recursively to make up full image.
Especially, combined with vector quantised latent representation, the
state-of-the-art auto-regressive transformer displays realistic high-resolution
images. However, sampling the latent code from discrete probability
distribution makes the output unpredictable. Therefore, it requires to generate
lots of diverse samples to acquire desired outputs. To alleviate the process of
generating lots of samples repetitively, in this article, we propose to take a
desired output, a style image, as an additional condition without re-training
the transformer. To this end, our method transfers the style to a probability
constraint to re-balance the prior, thereby specifying the target distribution
instead of the original prior. Thus, generated samples from the re-balanced
prior have similar styles to reference style. In practice, we can choose either
an image or a category of images as an additional condition. In our qualitative
assessment, we show that styles of majority of outputs are similar to the input
style.
Related papers
- Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
Diffusion models have dominated the field of large, generative image models.
We propose an algorithm for fast-constrained sampling in large pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - Generator Born from Classifier [66.56001246096002]
We aim to reconstruct an image generator, without relying on any data samples.
We propose a novel learning paradigm, in which the generator is trained to ensure that the convergence conditions of the network parameters are satisfied.
arXiv Detail & Related papers (2023-12-05T03:41:17Z) - Hierarchical Vector Quantized Transformer for Multi-class Unsupervised
Anomaly Detection [24.11900895337062]
Unsupervised image Anomaly Detection (UAD) aims to learn robust and discriminative representations of normal samples.
This paper focuses on building a unified framework for multiple classes.
arXiv Detail & Related papers (2023-10-22T08:20:33Z) - Bridging the Gap between Synthetic and Authentic Images for Multimodal
Machine Translation [51.37092275604371]
Multimodal machine translation (MMT) simultaneously takes the source sentence and a relevant image as input for translation.
Recent studies suggest utilizing powerful text-to-image generation models to provide image inputs.
However, synthetic images generated by these models often follow different distributions compared to authentic images.
arXiv Detail & Related papers (2023-10-20T09:06:30Z) - Image Deblurring by Exploring In-depth Properties of Transformer [86.7039249037193]
We leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics.
By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information.
One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space.
arXiv Detail & Related papers (2023-03-24T14:14:25Z) - Improved Masked Image Generation with Token-Critic [16.749458173904934]
We introduce Token-Critic, an auxiliary model to guide the sampling of a non-autoregressive generative transformer.
A state-of-the-art generative transformer significantly improves its performance, and outperforms recent diffusion models and GANs in terms of the trade-off between generated image quality and diversity.
arXiv Detail & Related papers (2022-09-09T17:57:21Z) - DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency.
The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on.
Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z) - Unleashing Transformers: Parallel Token Prediction with Discrete
Absorbing Diffusion for Fast High-Resolution Image Generation from
Vector-Quantized Codes [15.881911863960774]
Recent Vector-Quantized image models have overcome the limitation of image resolution but are prohibitively slow and unidirectional as they generate tokens via element-wise autoregressive sampling from the prior.
We propose a novel discrete diffusion probabilistic model prior which enables parallel prediction of Vector-Quantized tokens by using an unconstrained Transformer architecture as the backbone.
arXiv Detail & Related papers (2021-11-24T18:55:14Z) - High-Resolution Complex Scene Synthesis with Transformers [6.445605125467574]
coarse-grained synthesis of complex scene images via deep generative models has recently gained popularity.
We present an approach to this task, where the generative model is based on pure likelihood training without additional objectives.
We show that the resulting system is able to synthesize high-quality images consistent with the given layouts.
arXiv Detail & Related papers (2021-05-13T17:56:07Z) - Diverse Semantic Image Synthesis via Probability Distribution Modeling [103.88931623488088]
We propose a novel diverse semantic image synthesis framework.
Our method can achieve superior diversity and comparable quality compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-03-11T18:59:25Z) - Generating Images with Sparse Representations [21.27273495926409]
High dimensionality of images presents architecture and sampling-efficiency challenges for likelihood-based generative models.
We present an alternative approach, inspired by common image compression methods like JPEG, and convert images to quantized discrete cosine transform (DCT) blocks.
We propose a Transformer-based autoregressive architecture, which is trained to sequentially predict the conditional distribution of the next element in such sequences.
arXiv Detail & Related papers (2021-03-05T17:56:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.