PAGER: Progressive Attribute-Guided Extendable Robust Image Generation
- URL: http://arxiv.org/abs/2206.00162v1
- Date: Wed, 1 Jun 2022 00:35:42 GMT
- Title: PAGER: Progressive Attribute-Guided Extendable Robust Image Generation
- Authors: Zohreh Azizi and C.-C. Jay Kuo
- Abstract summary: This work presents a generative modeling approach based on successive subspace learning (SSL)
Unlike most generative models in the literature, our method does not utilize neural networks to analyze the underlying source distribution and synthesize images.
The resulting method, called the progressive-guided extendable robust image generative (R) model, has advantages in mathematical transparency, progressive content generation, lower training time, robust performance with fewer training samples, and extendibility to conditional image generation.
- Score: 38.484332924924914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work presents a generative modeling approach based on successive
subspace learning (SSL). Unlike most generative models in the literature, our
method does not utilize neural networks to analyze the underlying source
distribution and synthesize images. The resulting method, called the
progressive attribute-guided extendable robust image generative (PAGER) model,
has advantages in mathematical transparency, progressive content generation,
lower training time, robust performance with fewer training samples, and
extendibility to conditional image generation. PAGER consists of three modules:
core generator, resolution enhancer, and quality booster. The core generator
learns the distribution of low-resolution images and performs unconditional
image generation. The resolution enhancer increases image resolution via
conditional generation. Finally, the quality booster adds finer details to
generated images. Extensive experiments on MNIST, Fashion-MNIST, and CelebA
datasets are conducted to demonstrate generative performance of PAGER.
Related papers
- Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [62.06970466554273]
We present Meissonic, which non-autoregressive masked image modeling (MIM) text-to-image elevates to a level comparable with state-of-the-art diffusion models like SDXL.
We leverage high-quality training data, integrate micro-conditions informed by human preference scores, and employ feature compression layers to further enhance image fidelity and resolution.
Our model not only matches but often exceeds the performance of existing models like SDXL in generating high-quality, high-resolution images.
arXiv Detail & Related papers (2024-10-10T17:59:17Z) - RL for Consistency Models: Faster Reward Guided Text-to-Image Generation [15.238373471473645]
We propose a framework for fine-tuning consistency models viaReinforcement Learning (RL)
Our framework, called Reinforcement Learning for Consistency Model (RLCM), frames the iterative inference process of a consistency model as an RL procedure.
Comparing to RL finetuned diffusion models, RLCM trains significantly faster, improves the quality of the generation measured under the reward objectives, and speeds up the inference procedure by generating high quality images with as few as two inference steps.
arXiv Detail & Related papers (2024-03-25T15:40:22Z) - Active Generation for Image Classification [45.93535669217115]
We propose to address the efficiency of image generation by focusing on the specific needs and characteristics of the model.
With a central tenet of active learning, our method, named ActGen, takes a training-aware approach to image generation.
arXiv Detail & Related papers (2024-03-11T08:45:31Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - StraIT: Non-autoregressive Generation with Stratified Image Transformer [63.158996766036736]
Stratified Image Transformer(StraIT) is a pure non-autoregressive(NAR) generative model.
Our experiments demonstrate that StraIT significantly improves NAR generation and out-performs existing DMs and AR methods.
arXiv Detail & Related papers (2023-03-01T18:59:33Z) - BIGRoC: Boosting Image Generation via a Robust Classifier [27.66648389933265]
We propose a general model-agnostic technique for improving the image quality and the distribution fidelity of generated images.
Our method, termed BIGRoC, is based on a post-processing procedure via the guidance of a given robust classifier.
arXiv Detail & Related papers (2021-08-08T18:05:44Z) - Improved Image Generation via Sparse Modeling [27.66648389933265]
We show that generators can be viewed as manifestations of the Convolutional Sparse Coding (CSC) and its Multi-Layered version (ML-CSC) synthesis processes.
We leverage this observation by explicitly enforcing a sparsifying regularization on appropriately chosen activation layers in the generator.
arXiv Detail & Related papers (2021-04-01T13:52:40Z) - High-Fidelity Synthesis with Disentangled Representation [60.19657080953252]
We propose an Information-Distillation Generative Adrial Network (ID-GAN) for disentanglement learning and high-fidelity synthesis.
Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis.
Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation.
arXiv Detail & Related papers (2020-01-13T14:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.