Related papers: Semantic Image Synthesis with Semantically Coupled VQ-Model

Semantic Image Synthesis with Semantically Coupled VQ-Model

URL: http://arxiv.org/abs/2209.02536v1
Date: Tue, 6 Sep 2022 14:37:01 GMT
Title: Semantic Image Synthesis with Semantically Coupled VQ-Model
Authors: Stephan Alaniz, Thomas Hummel, Zeynep Akata
Abstract summary: We conditionally synthesize the latent space from a vector quantized model (VQ-model) pre-trained to autoencode images. We show that our model improves semantic image synthesis using autoregressive models on popular semantic image datasets ADE20k, Cityscapes and COCO-Stuff.
Score: 42.19799555533789
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semantic image synthesis enables control over unconditional image generation by allowing guidance on what is being generated. We conditionally synthesize the latent space from a vector quantized model (VQ-model) pre-trained to autoencode images. Instead of training an autoregressive Transformer on separately learned conditioning latents and image latents, we find that jointly learning the conditioning and image latents significantly improves the modeling capabilities of the Transformer model. While our jointly trained VQ-model achieves a similar reconstruction performance to a vanilla VQ-model for both semantic and image latents, tying the two modalities at the autoencoding stage proves to be an important ingredient to improve autoregressive modeling performance. We show that our model improves semantic image synthesis using autoregressive models on popular semantic image datasets ADE20k, Cityscapes and COCO-Stuff.

Related papers

Boosting Generative Image Modeling via Joint Image-Feature Synthesis [10.32324138962724]
We introduce a novel generative image modeling framework that seamlessly bridges the gap by leveraging a diffusion model to jointly model low-level image latents. Our latent-semantic diffusion approach learns to generate coherent image-feature pairs from pure noise. By eliminating the need for complex distillation objectives, our unified design simplifies training and unlocks a powerful new inference strategy: Representation Guidance.
arXiv Detail & Related papers (2025-04-22T17:41:42Z)
D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens [80.75893450536577]
We propose D2C, a novel two-stage method to enhance model generation capacity. In the first stage, the discrete-valued tokens representing coarse-grained image features are sampled by employing a small discrete-valued generator. In the second stage, the continuous-valued tokens representing fine-grained image features are learned conditioned on the discrete token sequence.
arXiv Detail & Related papers (2025-03-21T13:58:49Z)
Frequency Autoregressive Image Generation with Continuous Tokens [31.833852108014312]
We introduce the frequency progressive autoregressive (textbfFAR) paradigm and instantiate FAR with the continuous tokenizer. We demonstrate the efficacy of FAR through comprehensive experiments on the ImageNet dataset.
arXiv Detail & Related papers (2025-03-07T10:34:04Z)
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective [52.778766190479374]
Latent-based image generative models have achieved notable success in image generation tasks. Despite sharing the same latent space, autoregressive models significantly lag behind LDMs and MIMs in image generation. We propose a simple but effective discrete image tokenizer to stabilize the latent space for image generative modeling.
arXiv Detail & Related papers (2024-10-16T12:13:17Z)
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [62.06970466554273]
We present Meissonic, which non-autoregressive masked image modeling (MIM) text-to-image elevates to a level comparable with state-of-the-art diffusion models like SDXL. We leverage high-quality training data, integrate micro-conditions informed by human preference scores, and employ feature compression layers to further enhance image fidelity and resolution. Our model not only matches but often exceeds the performance of existing models like SDXL in generating high-quality, high-resolution images.
arXiv Detail & Related papers (2024-10-10T17:59:17Z)
Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability. We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images. Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z)
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images. Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z)
Exploring Stochastic Autoregressive Image Modeling for Visual Representation [24.582376834198403]
We propose a novel autoregressive image modeling (named SAIM) by the two simple designs. By introducing prediction and the parallel encoder-decoder, SAIM significantly improve the performance of autoregressive image modeling. Our method achieves the best accuracy (83.9%) on the vanilla ViT-Base model among methods using only ImageNet-1K data.
arXiv Detail & Related papers (2022-12-03T13:04:29Z)
Semantic Image Synthesis via Diffusion Models [174.24523061460704]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks. Recent work on semantic image synthesis mainly follows the de facto GAN-based approaches. We propose a novel framework based on DDPM for semantic image synthesis.
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder [73.1010640692609]
We propose a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis. Our model achieves state-of-the-art results and generates more photorealistic images specifically.
arXiv Detail & Related papers (2022-06-01T10:39:12Z)
High-Resolution Image Synthesis with Latent Diffusion Models [14.786952412297808]
Training diffusion models on autoencoders allows for the first time to reach a near-optimal point between complexity reduction and detail preservation. Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks.
arXiv Detail & Related papers (2021-12-20T18:55:25Z)
IMAGINE: Image Synthesis by Image-Guided Model Inversion [79.4691654458141]
We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images. We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations. IMAGINE enables the synthesis procedure to simultaneously 1) enforce semantic specificity constraints during the synthesis, 2) produce realistic images without generator training, and 3) give users intuitive control over the generation process.
arXiv Detail & Related papers (2021-04-13T02:00:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.