Semantify: Simplifying the Control of 3D Morphable Models using CLIP
- URL: http://arxiv.org/abs/2308.07415v1
- Date: Mon, 14 Aug 2023 19:07:26 GMT
- Title: Semantify: Simplifying the Control of 3D Morphable Models using CLIP
- Authors: Omer Gralnik, Guy Gafni, Ariel Shamir
- Abstract summary: Semantify: a self-supervised method that utilizes the semantic power of CLIP language-vision foundation model.
We present results on numerous 3DMMs: body shape models, face shape and expression models, as well as animal shapes.
- Score: 16.74483439465574
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We present Semantify: a self-supervised method that utilizes the semantic
power of CLIP language-vision foundation model to simplify the control of 3D
morphable models. Given a parametric model, training data is created by
randomly sampling the model's parameters, creating various shapes and rendering
them. The similarity between the output images and a set of word descriptors is
calculated in CLIP's latent space. Our key idea is first to choose a small set
of semantically meaningful and disentangled descriptors that characterize the
3DMM, and then learn a non-linear mapping from scores across this set to the
parametric coefficients of the given 3DMM. The non-linear mapping is defined by
training a neural network without a human-in-the-loop. We present results on
numerous 3DMMs: body shape models, face shape and expression models, as well as
animal shapes. We demonstrate how our method defines a simple slider interface
for intuitive modeling, and show how the mapping can be used to instantly fit a
3D parametric body shape to in-the-wild images.
Related papers
- Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - Disentangled3D: Learning a 3D Generative Model with Disentangled
Geometry and Appearance from Monocular Images [94.49117671450531]
State-of-the-art 3D generative models are GANs which use neural 3D volumetric representations for synthesis.
In this paper, we design a 3D GAN which can learn a disentangled model of objects, just from monocular observations.
arXiv Detail & Related papers (2022-03-29T22:03:18Z) - Text to Mesh Without 3D Supervision Using Limit Subdivision [13.358081015190255]
We present a technique for zero-shot generation of a 3D model using only a target text prompt.
We rely on a pre-trained CLIP model that compares the input text prompt with differentiably rendered images of our 3D model.
arXiv Detail & Related papers (2022-03-24T20:36:28Z) - Template NeRF: Towards Modeling Dense Shape Correspondences from
Category-Specific Object Images [4.662583832063716]
We present neural radiance fields (NeRF) with templates, dubbed template-NeRF, for modeling appearance and geometry.
We generate dense shape correspondences simultaneously among objects of the same category from only multi-view posed images.
The learned dense correspondences can be readily used for various image-based tasks such as keypoint detection, part segmentation, and texture transfer.
arXiv Detail & Related papers (2021-11-08T02:16:48Z) - Learning Free-Form Deformation for 3D Face Reconstruction from
In-The-Wild Images [19.799466588741836]
We propose a learning-based method that reconstructs a 3D face mesh through Free-Form Deformation (FFD) for the first time.
Experiments on multiple datasets demonstrate how our method successfully estimates the 3D face geometry and facial expressions from 2D face images.
arXiv Detail & Related papers (2021-05-31T10:19:20Z) - Learning Feature Aggregation for Deep 3D Morphable Models [57.1266963015401]
We propose an attention based module to learn mapping matrices for better feature aggregation across hierarchical levels.
Our experiments show that through the end-to-end training of the mapping matrices, we achieve state-of-the-art results on a variety of 3D shape datasets.
arXiv Detail & Related papers (2021-05-05T16:41:00Z) - Building 3D Morphable Models from a Single Scan [3.472931603805115]
We propose a method for constructing generative models of 3D objects from a single 3D mesh.
Our method produces a 3D morphable model that represents shape and albedo in terms of Gaussian processes.
We show that our approach can be used to perform face recognition using only a single 3D scan.
arXiv Detail & Related papers (2020-11-24T23:08:14Z) - Canonical 3D Deformer Maps: Unifying parametric and non-parametric
methods for dense weakly-supervised category reconstruction [79.98689027127855]
We propose a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects.
Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings.
It achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.
arXiv Detail & Related papers (2020-08-28T15:44:05Z) - Combining Implicit Function Learning and Parametric Models for 3D Human
Reconstruction [123.62341095156611]
Implicit functions represented as deep learning approximations are powerful for reconstructing 3D surfaces.
Such features are essential in building flexible models for both computer graphics and computer vision.
We present methodology that combines detail-rich implicit functions and parametric representations.
arXiv Detail & Related papers (2020-07-22T13:46:14Z) - Learning Local Neighboring Structure for Robust 3D Shape Representation [143.15904669246697]
Representation learning for 3D meshes is important in many computer vision and graphics applications.
We propose a local structure-aware anisotropic convolutional operation (LSA-Conv)
Our model produces significant improvement in 3D shape reconstruction compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-04-21T13:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.