Improving Few-shot Image Generation by Structural Discrimination and
Textural Modulation
- URL: http://arxiv.org/abs/2308.16110v1
- Date: Wed, 30 Aug 2023 16:10:21 GMT
- Title: Improving Few-shot Image Generation by Structural Discrimination and
Textural Modulation
- Authors: Mengping Yang, Zhe Wang, Wenyi Feng, Qian Zhang, Ting Xiao
- Abstract summary: Few-shot image generation aims to produce plausible and diverse images for one category given a few images from this category.
Existing approaches either globally interpolate different images or fuse local representations with pre-defined coefficients.
This paper proposes a novel mechanism to inject external semantic signals into internal local representations.
- Score: 10.389698647141296
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Few-shot image generation, which aims to produce plausible and diverse images
for one category given a few images from this category, has drawn extensive
attention. Existing approaches either globally interpolate different images or
fuse local representations with pre-defined coefficients. However, such an
intuitive combination of images/features only exploits the most relevant
information for generation, leading to poor diversity and coarse-grained
semantic fusion. To remedy this, this paper proposes a novel textural
modulation (TexMod) mechanism to inject external semantic signals into internal
local representations. Parameterized by the feedback from the discriminator,
our TexMod enables more fined-grained semantic injection while maintaining the
synthesis fidelity. Moreover, a global structural discriminator (StructD) is
developed to explicitly guide the model to generate images with reasonable
layout and outline. Furthermore, the frequency awareness of the model is
reinforced by encouraging the model to distinguish frequency signals. Together
with these techniques, we build a novel and effective model for few-shot image
generation. The effectiveness of our model is identified by extensive
experiments on three popular datasets and various settings. Besides achieving
state-of-the-art synthesis performance on these datasets, our proposed
techniques could be seamlessly integrated into existing models for a further
performance boost.
Related papers
- Are CLIP features all you need for Universal Synthetic Image Origin Attribution? [13.96698277726253]
We propose a framework that incorporates features from large pre-trained foundation models to perform Open-Set origin attribution of synthetic images.
We show that our method leads to remarkable attribution performance, even in the low-data regime.
arXiv Detail & Related papers (2024-08-17T09:54:21Z) - Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment [20.902935570581207]
We introduce a Multimodal Alignment and Reconstruction Network (MARNet) to enhance the model's resistance to visual noise.
MARNet includes a cross-modal diffusion reconstruction module for smoothly and stably blending information across different domains.
Experiments conducted on two benchmark datasets, Vireo-Food172 and Ingredient-101, demonstrate that MARNet effectively improves the quality of image information extracted by the model.
arXiv Detail & Related papers (2024-07-26T16:30:18Z) - Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Multi-View Unsupervised Image Generation with Cross Attention Guidance [23.07929124170851]
This paper introduces a novel pipeline for unsupervised training of a pose-conditioned diffusion model on single-category datasets.
We identify object poses by clustering the dataset through comparing visibility and locations of specific object parts.
Our model, MIRAGE, surpasses prior work in novel view synthesis on real images.
arXiv Detail & Related papers (2023-12-07T14:55:13Z) - Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion [50.59261592343479]
We present Kandinsky1, a novel exploration of latent diffusion architecture.
The proposed model is trained separately to map text embeddings to image embeddings of CLIP.
We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting.
arXiv Detail & Related papers (2023-10-05T12:29:41Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.