3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch
Feature Swapping for Bodies and Faces
- URL: http://arxiv.org/abs/2111.12448v2
- Date: Thu, 25 Nov 2021 15:20:32 GMT
- Title: 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch
Feature Swapping for Bodies and Faces
- Authors: Simone Foti, Bongjin Koo, Danail Stoyanov, Matthew J. Clarkson
- Abstract summary: We propose a self-supervised approach to train a 3D shape variational autoencoder which encourages a disentangled latent representation of identity features.
Experimental results conducted on 3D meshes show that state-of-the-art methods for latent disentanglement are not able to disentangle identity features of faces and bodies.
- Score: 12.114711258010367
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning a disentangled, interpretable, and structured latent representation
in 3D generative models of faces and bodies is still an open problem. The
problem is particularly acute when control over identity features is required.
In this paper, we propose an intuitive yet effective self-supervised approach
to train a 3D shape variational autoencoder (VAE) which encourages a
disentangled latent representation of identity features. Curating the
mini-batch generation by swapping arbitrary features across different shapes
allows to define a loss function leveraging known differences and similarities
in the latent representations. Experimental results conducted on 3D meshes show
that state-of-the-art methods for latent disentanglement are not able to
disentangle identity features of faces and bodies. Our proposed method properly
decouples the generation of such features while maintaining good representation
and reconstruction capabilities.
Related papers
- GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation [75.39457097832113]
This paper introduces a novel 3D generation framework, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space.
Our framework employs a Variational Autoencoder with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information.
The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs.
arXiv Detail & Related papers (2024-11-12T18:59:32Z) - Deformable 3D Shape Diffusion Model [21.42513407755273]
We introduce a novel deformable 3D shape diffusion model that facilitates comprehensive 3D shape manipulation.
We demonstrate state-of-the-art performance in point cloud generation and competitive results in mesh deformation.
Our method presents a unique pathway for advancing 3D shape manipulation and unlocking new opportunities in the realm of virtual reality.
arXiv Detail & Related papers (2024-07-31T08:24:42Z) - 3D Face Modeling via Weakly-supervised Disentanglement Network joint Identity-consistency Prior [62.80458034704989]
Generative 3D face models featuring disentangled controlling factors hold immense potential for diverse applications in computer vision and computer graphics.
Previous 3D face modeling methods face a challenge as they demand specific labels to effectively disentangle these factors.
This paper introduces a Weakly-Supervised Disentanglement Framework, denoted as WSDF, to facilitate the training of controllable 3D face models without an overly stringent labeling requirement.
arXiv Detail & Related papers (2024-04-25T11:50:47Z) - DrFER: Learning Disentangled Representations for 3D Facial Expression
Recognition [28.318304721838096]
We introduce the innovative DrFER method, which brings the concept of disentangled representation learning to the field of 3D FER.
DrFER employs a dual-branch framework to effectively disentangle expression information from identity information.
This adaptation enhances the capability of the framework in recognizing facial expressions, even in cases involving varying head poses.
arXiv Detail & Related papers (2024-03-13T08:00:07Z) - OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis [81.70922087960271]
We present OmniAvatar, a novel geometry-guided 3D head synthesis model trained from in-the-wild unstructured images.
Our model can synthesize more preferable identity-preserved 3D heads with compelling dynamic details compared to the state-of-the-art methods.
arXiv Detail & Related papers (2023-03-27T18:36:53Z) - Controllable 3D Generative Adversarial Face Model via Disentangling
Shape and Appearance [63.13801759915835]
3D face modeling has been an active area of research in computer vision and computer graphics.
This paper proposes a new 3D face generative model that can decouple identity and expression.
arXiv Detail & Related papers (2022-08-30T13:40:48Z) - Learning Canonical 3D Object Representation for Fine-Grained Recognition [77.33501114409036]
We propose a novel framework for fine-grained object recognition that learns to recover object variation in 3D space from a single image.
We represent an object as a composition of 3D shape and its appearance, while eliminating the effect of camera viewpoint.
By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object.
arXiv Detail & Related papers (2021-08-10T12:19:34Z) - Reconstructing Recognizable 3D Face Shapes based on 3D Morphable Models [20.381926248856452]
We propose a novel shape identity-aware regularization(SIR) loss for shape parameters, aiming at increasing discriminability in both the shape parameter and shape geometry domains.
We compare our method with existing methods in terms of the reconstruction error, visual distinguishability, and face recognition accuracy of the shape parameters.
arXiv Detail & Related papers (2021-04-08T05:11:48Z) - gradSim: Differentiable simulation for system identification and
visuomotor control [66.37288629125996]
We present gradSim, a framework that overcomes the dependence on 3D supervision by leveraging differentiable multiphysics simulation and differentiable rendering.
Our unified graph enables learning in challenging visuomotor control tasks, without relying on state-based (3D) supervision.
arXiv Detail & Related papers (2021-04-06T16:32:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.