Fine-grained Image-to-Image Transformation towards Visual Recognition
- URL: http://arxiv.org/abs/2001.03856v2
- Date: Sat, 13 Jun 2020 02:18:53 GMT
- Title: Fine-grained Image-to-Image Transformation towards Visual Recognition
- Authors: Wei Xiong, Yutong He, Yixuan Zhang, Wenhan Luo, Lin Ma, Jiebo Luo
- Abstract summary: We aim at transforming an image with a fine-grained category to synthesize new images that preserve the identity of the input image.
We adopt a model based on generative adversarial networks to disentangle the identity related and unrelated factors of an image.
Experiments on the CompCars and Multi-PIE datasets demonstrate that our model preserves the identity of the generated images much better than the state-of-the-art image-to-image transformation models.
- Score: 102.51124181873101
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing image-to-image transformation approaches primarily focus on
synthesizing visually pleasing data. Generating images with correct identity
labels is challenging yet much less explored. It is even more challenging to
deal with image transformation tasks with large deformation in poses,
viewpoints, or scales while preserving the identity, such as face rotation and
object viewpoint morphing. In this paper, we aim at transforming an image with
a fine-grained category to synthesize new images that preserve the identity of
the input image, which can thereby benefit the subsequent fine-grained image
recognition and few-shot learning tasks. The generated images, transformed with
large geometric deformation, do not necessarily need to be of high visual
quality but are required to maintain as much identity information as possible.
To this end, we adopt a model based on generative adversarial networks to
disentangle the identity related and unrelated factors of an image. In order to
preserve the fine-grained contextual details of the input image during the
deformable transformation, a constrained nonalignment connection method is
proposed to construct learnable highways between intermediate convolution
blocks in the generator. Moreover, an adaptive identity modulation mechanism is
proposed to transfer the identity information into the output image
effectively. Extensive experiments on the CompCars and Multi-PIE datasets
demonstrate that our model preserves the identity of the generated images much
better than the state-of-the-art image-to-image transformation models, and as a
result significantly boosts the visual recognition performance in fine-grained
few-shot learning.
Related papers
- Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images.
We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process.
Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z) - Learning Transferable Object-Centric Diffeomorphic Transformations for
Data Augmentation in Medical Image Segmentation [4.710950544945832]
We propose a novel object-centric data augmentation model for medical image segmentation.
It is able to learn the shape variations for the objects of interest and augment the object in place without modifying the rest of the image.
We demonstrate its effectiveness in improving kidney tumour segmentation when leveraging shape variations learned both from within the same dataset and transferred from external datasets.
arXiv Detail & Related papers (2023-07-25T16:54:48Z) - ParGAN: Learning Real Parametrizable Transformations [50.51405390150066]
We propose ParGAN, a generalization of the cycle-consistent GAN framework to learn image transformations.
The proposed generator takes as input both an image and a parametrization of the transformation.
We show how, with disjoint image domains with no annotated parametrization, our framework can create smooths as well as learn multiple transformations simultaneously.
arXiv Detail & Related papers (2022-11-09T16:16:06Z) - Visual Prompt Tuning for Generative Transfer Learning [26.895321693202284]
We present a recipe for learning vision transformers by generative knowledge transfer.
We base our framework on state-of-the-art generative vision transformers that represent an image as a sequence of visual tokens to the autoregressive or non-autoregressive transformers.
To adapt to a new domain, we employ prompt tuning, which prepends learnable tokens called prompt to the image token sequence.
arXiv Detail & Related papers (2022-10-03T14:56:05Z) - DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency.
The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on.
Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z) - Robust Training Using Natural Transformation [19.455666609149567]
We present NaTra, an adversarial training scheme to improve robustness of image classification algorithms.
We target attributes of the input images that are independent of the class identification, and manipulate those attributes to mimic real-world natural transformations.
We demonstrate the efficacy of our scheme by utilizing the disentangled latent representations derived from well-trained GANs.
arXiv Detail & Related papers (2021-05-10T01:56:03Z) - A 3D GAN for Improved Large-pose Facial Recognition [3.791440300377753]
Facial recognition using deep convolutional neural networks relies on the availability of large datasets of face images.
Recent studies have shown that current methods of disentangling pose from identity are inadequate.
In this work we incorporate a 3D morphable model into the generator of a GAN in order to learn a nonlinear texture model from in-the-wild images.
This allows generation of new, synthetic identities, and manipulation of pose, illumination and expression without compromising the identity.
arXiv Detail & Related papers (2020-12-18T22:41:15Z) - Learning to Caricature via Semantic Shape Transform [95.25116681761142]
We propose an algorithm based on a semantic shape transform to produce shape exaggerations.
We show that the proposed framework is able to render visually pleasing shape exaggerations while maintaining their facial structures.
arXiv Detail & Related papers (2020-08-12T03:41:49Z) - Cross-View Image Synthesis with Deformable Convolution and Attention
Mechanism [29.528402825356398]
We propose to use Generative Adversarial Networks(GANs) based on a deformable convolution and attention mechanism to solve the problem of cross-view image synthesis.
It is difficult to understand and transform scenes appearance and semantic information from another view, thus we use deformed convolution in the U-net network to improve the network's ability to extract features of objects at different scales.
arXiv Detail & Related papers (2020-07-20T03:08:36Z) - Semantic Photo Manipulation with a Generative Image Prior [86.01714863596347]
GANs are able to synthesize images conditioned on inputs such as user sketch, text, or semantic labels.
It is hard for GANs to precisely reproduce an input image.
In this paper, we address these issues by adapting the image prior learned by GANs to image statistics of an individual image.
Our method can accurately reconstruct the input image and synthesize new content, consistent with the appearance of the input image.
arXiv Detail & Related papers (2020-05-15T18:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.