LatentKeypointGAN: Controlling Images via Latent Keypoints
- URL: http://arxiv.org/abs/2103.15812v5
- Date: Sun, 13 Oct 2024 19:57:19 GMT
- Title: LatentKeypointGAN: Controlling Images via Latent Keypoints
- Authors: Xingzhe He, Bastian Wandt, Helge Rhodin,
- Abstract summary: We introduce LatentKeypointGAN, a two-stage GAN trained end-to-end on the classical GAN objective.
LatentKeypointGAN provides an interpretable latent space that can be used to re-arrange the generated images.
In addition, the explicit generation of keypoints and matching images enables a new, GAN-based method for unsupervised keypoint detection.
- Score: 23.670795505376336
- License:
- Abstract: Generative adversarial networks (GANs) have attained photo-realistic quality in image generation. However, how to best control the image content remains an open challenge. We introduce LatentKeypointGAN, a two-stage GAN which is trained end-to-end on the classical GAN objective with internal conditioning on a set of space keypoints. These keypoints have associated appearance embeddings that respectively control the position and style of the generated objects and their parts. A major difficulty that we address with suitable network architectures and training schemes is disentangling the image into spatial and appearance factors without domain knowledge and supervision signals. We demonstrate that LatentKeypointGAN provides an interpretable latent space that can be used to re-arrange the generated images by re-positioning and exchanging keypoint embeddings, such as generating portraits by combining the eyes, nose, and mouth from different images. In addition, the explicit generation of keypoints and matching images enables a new, GAN-based method for unsupervised keypoint detection.
Related papers
- Design and Identification of Keypoint Patches in Unstructured Environments [7.940068522906917]
Keypoint identification in an image allows direct mapping from raw images to 2D coordinates.
We propose four simple yet distinct designs that consider various scale, rotation and camera projection.
We customize the Superpoint network to ensure robust detection under various types of image degradation.
arXiv Detail & Related papers (2024-10-01T09:05:50Z) - In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural
Network [52.29330138835208]
Accurately matching local features between a pair of images is a challenging computer vision task.
Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images.
We propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide message passing.
arXiv Detail & Related papers (2023-07-04T02:50:44Z) - Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold [79.94300820221996]
DragGAN is a new way of controlling generative adversarial networks (GANs)
DragGAN allows anyone to deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc.
Both qualitative and quantitative comparisons demonstrate the advantage of DragGAN over prior approaches in the tasks of image manipulation and point tracking.
arXiv Detail & Related papers (2023-05-18T13:41:25Z) - LatentKeypointGAN: Controlling Images via Latent Keypoints -- Extended
Abstract [16.5436159805682]
We introduce LatentKeypointGAN, a two-stage GAN conditioned on a set of keypoints and associated appearance embeddings.
LatentKeypointGAN provides an interpretable latent space that can be used to re-arrange the generated images.
arXiv Detail & Related papers (2022-05-06T19:00:07Z) - Probabilistic Spatial Distribution Prior Based Attentional Keypoints
Matching Network [19.708243062836104]
Keypoints matching is a pivotal component for many image-relevant applications such as image stitching, visual simultaneous localization and mapping.
In this paper, we demonstrate that the motion estimation from IMU integration can be used to exploit the spatial distribution prior of keypoints between images.
We present a projection loss for the proposed keypoints matching network, which gives a smooth edge between matching and un-matching keypoints.
arXiv Detail & Related papers (2021-11-17T09:52:03Z) - Weakly Supervised Keypoint Discovery [27.750244813890262]
We propose a method for keypoint discovery from a 2D image using image-level supervision.
Motivated by the weakly-supervised learning approach, our method exploits image-level supervision to identify discriminative parts.
Our approach achieves state-of-the-art performance for the task of keypoint estimation on the limited supervision scenarios.
arXiv Detail & Related papers (2021-09-28T01:26:53Z) - End-to-End Learning of Keypoint Representations for Continuous Control
from Images [84.8536730437934]
We show that it is possible to learn efficient keypoint representations end-to-end, without the need for unsupervised pre-training, decoders, or additional losses.
Our proposed architecture consists of a differentiable keypoint extractor that feeds the coordinates directly to a soft actor-critic agent.
arXiv Detail & Related papers (2021-06-15T09:17:06Z) - Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose.
Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification.
We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z) - Disentangled Image Generation Through Structured Noise Injection [48.956122902434444]
We show that disentanglement in the first layer of the generator network leads to disentanglement in the generated image.
We achieve spatial disentanglement, scale-space disentanglement, and disentanglement of the foreground object from the background style.
This empirically leads to better disentanglement scores than state-of-the-art methods on the FFHQ dataset.
arXiv Detail & Related papers (2020-04-26T15:15:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.