Learning Modulated Transformation in GANs
- URL: http://arxiv.org/abs/2308.15472v1
- Date: Tue, 29 Aug 2023 17:51:22 GMT
- Title: Learning Modulated Transformation in GANs
- Authors: Ceyuan Yang, Qihang Zhang, Yinghao Xu, Jiapeng Zhu, Yujun Shen, Bo Dai
- Abstract summary: We equip the generator in generative adversarial networks (GANs) with a plug-and-play module, termed as modulated transformation module (MTM)
MTM predicts spatial offsets under the control of latent codes, based on which the convolution operation can be applied at variable locations.
It is noteworthy that towards human generation on the challenging TaiChi dataset, we improve the FID of StyleGAN3 from 21.36 to 13.60, demonstrating the efficacy of learning modulated geometry transformation.
- Score: 69.95217723100413
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The success of style-based generators largely benefits from style modulation,
which helps take care of the cross-instance variation within data. However, the
instance-wise stochasticity is typically introduced via regular convolution,
where kernels interact with features at some fixed locations, limiting its
capacity for modeling geometric variation. To alleviate this problem, we equip
the generator in generative adversarial networks (GANs) with a plug-and-play
module, termed as modulated transformation module (MTM). This module predicts
spatial offsets under the control of latent codes, based on which the
convolution operation can be applied at variable locations for different
instances, and hence offers the model an additional degree of freedom to handle
geometry deformation. Extensive experiments suggest that our approach can be
faithfully generalized to various generative tasks, including image generation,
3D-aware image synthesis, and video generation, and get compatible with
state-of-the-art frameworks without any hyper-parameter tuning. It is
noteworthy that, towards human generation on the challenging TaiChi dataset, we
improve the FID of StyleGAN3 from 21.36 to 13.60, demonstrating the efficacy of
learning modulated geometry transformation.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - GeoMFormer: A General Architecture for Geometric Molecular Representation Learning [84.02083170392764]
We introduce a novel Transformer-based molecular model called GeoMFormer to achieve this goal.
We show that GeoMFormer achieves strong performance on both invariant and equivariant tasks of different types and scales.
arXiv Detail & Related papers (2024-06-24T17:58:13Z) - GGAvatar: Geometric Adjustment of Gaussian Head Avatar [6.58321368492053]
GGAvatar is a novel 3D avatar representation designed to robustly model dynamic head avatars with complex identities.
GGAvatar can produce high-fidelity renderings, outperforming state-of-the-art methods in visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-05-20T12:54:57Z) - Multiple View Geometry Transformers for 3D Human Pose Estimation [35.26756920323391]
We aim to improve the 3D reasoning ability of Transformers in multi-view 3D human pose estimation.
We propose a novel hybrid model, MVGFormer, which has a series of geometric and appearance modules organized in an iterative manner.
arXiv Detail & Related papers (2023-11-18T06:32:40Z) - Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image
Compression [63.56922682378755]
We focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding.
The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform.
Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
arXiv Detail & Related papers (2023-08-17T01:34:51Z) - 3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion
Process [32.3773514247982]
We develop a generalized 3D shape generation prior model tailored for multiple 3D tasks.
Designs jointly equip our proposed 3D shape prior model with high-fidelity, diverse features as well as the capability of cross-modality alignment.
arXiv Detail & Related papers (2023-03-18T12:50:29Z) - 3D Generative Model Latent Disentanglement via Local Eigenprojection [13.713373496487012]
We introduce a novel loss function grounded in spectral geometry for different neural-network-based generative models of 3D head and body meshes.
Experimental results show that our local eigenprojection disentangled (LED) models offer improved disentanglement with respect to the state-of-the-art.
arXiv Detail & Related papers (2023-02-24T18:19:49Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z) - Instant recovery of shape from spectrum via latent space connections [33.83258865005668]
We introduce the first learning-based method for recovering shapes from Laplacian spectra.
Given an auto-encoder, our model takes the form of a cycle-consistent module to map latent vectors to sequences of eigenvalues.
Our data-driven approach replaces the need for ad-hoc regularizers required by prior methods, while providing more accurate results at a fraction of the computational cost.
arXiv Detail & Related papers (2020-03-14T00:48:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.