LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis
- URL: http://arxiv.org/abs/2301.04604v2
- Date: Mon, 25 Sep 2023 08:03:30 GMT
- Title: LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis
- Authors: Jiapeng Zhu, Ceyuan Yang, Yujun Shen, Zifan Shi, Bo Dai, Deli Zhao,
Qifeng Chen
- Abstract summary: This work presents an easy-to-use regularizer for GAN training.
It helps explicitly link some axes of the latent space to a set of pixels in the synthesized image.
- Score: 104.26279487968839
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work presents an easy-to-use regularizer for GAN training, which helps
explicitly link some axes of the latent space to a set of pixels in the
synthesized image. Establishing such a connection facilitates a more convenient
local control of GAN generation, where users can alter the image content only
within a spatial area simply by partially resampling the latent code.
Experimental results confirm four appealing properties of our regularizer,
which we call LinkGAN. (1) The latent-pixel linkage is applicable to either a
fixed region (\textit{i.e.}, same for all instances) or a particular semantic
category (i.e., varying across instances), like the sky. (2) Two or multiple
regions can be independently linked to different latent axes, which further
supports joint control. (3) Our regularizer can improve the spatial
controllability of both 2D and 3D-aware GAN models, barely sacrificing the
synthesis performance. (4) The models trained with our regularizer are
compatible with GAN inversion techniques and maintain editability on real
images.
Related papers
- In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - Entity-Level Text-Guided Image Manipulation [70.81648416508867]
We study a novel task on text-guided image manipulation on the entity level in the real world (eL-TGIM)
We propose an elegant framework, dubbed as SeMani, forming the Semantic Manipulation of real-world images.
In the semantic alignment phase, SeMani incorporates a semantic alignment module to locate the entity-relevant region to be manipulated.
In the image manipulation phase, SeMani adopts a generative model to synthesize new images conditioned on the entity-irrelevant regions and target text descriptions.
arXiv Detail & Related papers (2023-02-22T13:56:23Z) - Semantic 3D-aware Portrait Synthesis and Manipulation Based on
Compositional Neural Radiance Field [55.431697263581626]
We propose a Compositional Neural Radiance Field (CNeRF) for semantic 3D-aware portrait synthesis and manipulation.
CNeRF divides the image by semantic regions and learns an independent neural radiance field for each region, and finally fuses them and renders the complete image.
Compared to state-of-the-art 3D-aware GAN methods, our approach enables fine-grained semantic region manipulation, while maintaining high-quality 3D-consistent synthesis.
arXiv Detail & Related papers (2023-02-03T07:17:46Z) - 3D GAN Inversion with Pose Optimization [26.140281977885376]
We introduce a generalizable 3D GAN inversion method that infers camera viewpoint and latent code simultaneously to enable multi-view consistent semantic image editing.
We conduct extensive experiments on image reconstruction and editing both quantitatively and qualitatively, and further compare our results with 2D GAN-based editing.
arXiv Detail & Related papers (2022-10-13T19:06:58Z) - RSINet: Inpainting Remotely Sensed Images Using Triple GAN Framework [13.613245876782367]
We propose a novel inpainting method that individually focuses on each aspect of an image such as edges, colour and texture.
Each individual GAN also incorporates the attention mechanism that explicitly extracts the spectral and spatial features.
We evaluate our model, alongwith previous state of the art models, on the two well known remote sensing datasets, Open Cities AI and Earth on Canvas.
arXiv Detail & Related papers (2022-02-12T05:19:37Z) - SemanticStyleGAN: Learning Compositional Generative Priors for
Controllable Image Synthesis and Editing [35.02841064647306]
StyleGANs provide promising prior models for downstream tasks on image synthesis and editing.
We present SemanticStyleGAN, where a generator is trained to model local semantic parts separately and synthesizes images in a compositional way.
arXiv Detail & Related papers (2021-12-04T04:17:11Z) - Cycle-Consistent Inverse GAN for Text-to-Image Synthesis [101.97397967958722]
We propose a novel unified framework of Cycle-consistent Inverse GAN for both text-to-image generation and text-guided image manipulation tasks.
We learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image.
In the text-guided optimization module, we generate images with the desired semantic attributes by optimizing the inverted latent codes.
arXiv Detail & Related papers (2021-08-03T08:38:16Z) - Low-Rank Subspaces in GANs [101.48350547067628]
This work introduces low-rank subspaces that enable more precise control of GAN generation.
LowRankGAN is able to find the low-dimensional representation of attribute manifold.
Experiments on state-of-the-art GAN models (including StyleGAN2 and BigGAN) trained on various datasets demonstrate the effectiveness of our LowRankGAN.
arXiv Detail & Related papers (2021-06-08T16:16:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.