Attribute Group Editing for Reliable Few-shot Image Generation
- URL: http://arxiv.org/abs/2203.08422v1
- Date: Wed, 16 Mar 2022 06:54:09 GMT
- Title: Attribute Group Editing for Reliable Few-shot Image Generation
- Authors: Guanqi Ding, Xinzhe Han, Shuhui Wang, Shuzhe Wu, Xin Jin, Dandan Tu
and Qingming Huang
- Abstract summary: We propose a new editing-based method, i.e., Attribute Group Editing (AGE), for few-shot image generation.
AGE examines the internal representation learned in GANs and identifies semantically meaningful directions.
- Score: 85.52840521454411
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot image generation is a challenging task even using the
state-of-the-art Generative Adversarial Networks (GANs). Due to the unstable
GAN training process and the limited training data, the generated images are
often of low quality and low diversity. In this work, we propose a new
editing-based method, i.e., Attribute Group Editing (AGE), for few-shot image
generation. The basic assumption is that any image is a collection of
attributes and the editing direction for a specific attribute is shared across
all categories. AGE examines the internal representation learned in GANs and
identifies semantically meaningful directions. Specifically, the class
embedding, i.e., the mean vector of the latent codes from a specific category,
is used to represent the category-relevant attributes, and the
category-irrelevant attributes are learned globally by Sparse Dictionary
Learning on the difference between the sample embedding and the class
embedding. Given a GAN well trained on seen categories, diverse images of
unseen categories can be synthesized through editing category-irrelevant
attributes while keeping category-relevant attributes unchanged. Without
re-training the GAN, AGE is capable of not only producing more realistic and
diverse images for downstream visual applications with limited data but
achieving controllable image editing with interpretable category-irrelevant
directions.
Related papers
- TAGE: Trustworthy Attribute Group Editing for Stable Few-shot Image Generation [10.569380190029317]
TAGE is an innovative image generation network comprising three integral modules.
The CPM module delves into the semantic dimensions of category-agnostic attributes, encapsulating them within a discrete codebook.
The PSM module generates semantic cues that are seamlessly integrated into the Transformer architecture of the CPM.
arXiv Detail & Related papers (2024-10-23T13:26:19Z) - Text Descriptions are Compressive and Invariant Representations for
Visual Learning [63.3464863723631]
We show that an alternative approach, in line with humans' understanding of multiple visual features per class, can provide compelling performance in the robust few-shot learning setting.
In particular, we introduce a novel method, textit SLR-AVD (Sparse Logistic Regression using Augmented Visual Descriptors).
This method first automatically generates multiple visual descriptions of each class via a large language model (LLM), then uses a VLM to translate these descriptions to a set of visual feature embeddings of each image, and finally uses sparse logistic regression to select a relevant subset of these features to classify
arXiv Detail & Related papers (2023-07-10T03:06:45Z) - Stable Attribute Group Editing for Reliable Few-shot Image Generation [88.59350889410794]
We present an editing-based'' framework Attribute Group Editing (AGE) for reliable few-shot image generation.
We find that class inconsistency is a common problem in GAN-generated images for downstream classification.
We propose to boost the downstream classification performance of SAGE by enhancing the pixel and frequency components.
arXiv Detail & Related papers (2023-02-01T01:51:47Z) - Leveraging Off-the-shelf Diffusion Model for Multi-attribute Fashion
Image Manipulation [27.587905673112473]
Fashion attribute editing is a task that aims to convert the semantic attributes of a given fashion image while preserving the irrelevant regions.
Previous works typically employ conditional GANs where the generator explicitly learns the target attributes and directly execute the conversion.
We explore the classifier-guided diffusion that leverages the off-the-shelf diffusion model pretrained on general visual semantics such as Imagenet.
arXiv Detail & Related papers (2022-10-12T02:21:18Z) - Attribute Prototype Network for Any-Shot Learning [113.50220968583353]
We argue that an image representation with integrated attribute localization ability would be beneficial for any-shot, i.e. zero-shot and few-shot, image classification tasks.
We propose a novel representation learning framework that jointly learns global and local features using only class-level attributes.
arXiv Detail & Related papers (2022-04-04T02:25:40Z) - Explaining in Style: Training a GAN to explain a classifier in
StyleSpace [75.75927763429745]
We present StylEx, a method for training a generative model to explain semantic attributes of an image.
StylEx finds attributes that align well with semantic ones, generate meaningful image-specific explanations, and are human-interpretable.
Our results show that the method finds attributes that align well with semantic ones, generate meaningful image-specific explanations, and are human-interpretable.
arXiv Detail & Related papers (2021-04-27T17:57:19Z) - Multi-class Generative Adversarial Nets for Semi-supervised Image
Classification [0.17404865362620794]
We show how similar images cause the GAN to generalize, leading to the poor classification of images.
We propose a modification to the traditional training of GANs that allows for improved multi-class classification in similar classes of images in a semi-supervised learning framework.
arXiv Detail & Related papers (2021-02-13T15:26:17Z) - SMILE: Semantically-guided Multi-attribute Image and Layout Editing [154.69452301122175]
Attribute image manipulation has been a very active topic since the introduction of Generative Adversarial Networks (GANs)
We present a multimodal representation that handles all attributes, be it guided by random noise or images, while only using the underlying domain information of the target domain.
Our method is capable of adding, removing or changing either fine-grained or coarse attributes by using an image as a reference or by exploring the style distribution space.
arXiv Detail & Related papers (2020-10-05T20:15:21Z) - Realizing Pixel-Level Semantic Learning in Complex Driving Scenes based
on Only One Annotated Pixel per Class [17.481116352112682]
We propose a new semantic segmentation task under complex driving scenes based on weakly supervised condition.
A three step process is built for pseudo labels generation, which progressively implement optimal feature representation for each category.
Experiments on Cityscapes dataset demonstrate that the proposed method provides a feasible way to solve weakly supervised semantic segmentation task.
arXiv Detail & Related papers (2020-03-10T12:57:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.