AWOL: Analysis WithOut synthesis using Language
- URL: http://arxiv.org/abs/2404.03042v1
- Date: Wed, 3 Apr 2024 20:04:44 GMT
- Title: AWOL: Analysis WithOut synthesis using Language
- Authors: Silvia Zuffi, Michael J. Black,
- Abstract summary: We leverage language to control existing 3D shape models to produce novel shapes.
We show that we can use text to generate new animals not present during training.
This work also constitutes the first language-driven method for generating 3D trees.
- Score: 57.31874938870305
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Many classical parametric 3D shape models exist, but creating novel shapes with such models requires expert knowledge of their parameters. For example, imagine creating a specific type of tree using procedural graphics or a new kind of animal from a statistical shape model. Our key idea is to leverage language to control such existing models to produce novel shapes. This involves learning a mapping between the latent space of a vision-language model and the parameter space of the 3D model, which we do using a small set of shape and text pairs. Our hypothesis is that mapping from language to parameters allows us to generate parameters for objects that were never seen during training. If the mapping between language and parameters is sufficiently smooth, then interpolation or generalization in language should translate appropriately into novel 3D shapes. We test our approach with two very different types of parametric shape models (quadrupeds and arboreal trees). We use a learned statistical shape model of quadrupeds and show that we can use text to generate new animals not present during training. In particular, we demonstrate state-of-the-art shape estimation of 3D dogs. This work also constitutes the first language-driven method for generating 3D trees. Finally, embedding images in the CLIP latent space enables us to generate animals and trees directly from images.
Related papers
- SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes [62.82552328188602]
We present SCULPT, a novel 3D generative model for clothed and textured 3D meshes of humans.
We devise a deep neural network that learns to represent the geometry and appearance distribution of clothed human bodies.
arXiv Detail & Related papers (2023-08-21T11:23:25Z) - Semantify: Simplifying the Control of 3D Morphable Models using CLIP [16.74483439465574]
Semantify: a self-supervised method that utilizes the semantic power of CLIP language-vision foundation model.
We present results on numerous 3DMMs: body shape models, face shape and expression models, as well as animal shapes.
arXiv Detail & Related papers (2023-08-14T19:07:26Z) - Disentangled3D: Learning a 3D Generative Model with Disentangled
Geometry and Appearance from Monocular Images [94.49117671450531]
State-of-the-art 3D generative models are GANs which use neural 3D volumetric representations for synthesis.
In this paper, we design a 3D GAN which can learn a disentangled model of objects, just from monocular observations.
arXiv Detail & Related papers (2022-03-29T22:03:18Z) - Building 3D Generative Models from Minimal Data [3.472931603805115]
We show that our approach can be used to perform face recognition using only a single 3D template (one scan total, not one per person)
We extend our model to a preliminary unsupervised learning framework that enables the learning of the distribution of 3D faces using one 3D template and a small number of 2D images.
arXiv Detail & Related papers (2022-03-04T20:10:50Z) - DOVE: Learning Deformable 3D Objects by Watching Videos [89.43105063468077]
We present DOVE, which learns to predict 3D canonical shape, deformation, viewpoint and texture from a single 2D image of a bird.
Our method reconstructs temporally consistent 3D shape and deformation, which allows us to animate and re-render the bird from arbitrary viewpoints.
arXiv Detail & Related papers (2021-07-22T17:58:10Z) - Building 3D Morphable Models from a Single Scan [3.472931603805115]
We propose a method for constructing generative models of 3D objects from a single 3D mesh.
Our method produces a 3D morphable model that represents shape and albedo in terms of Gaussian processes.
We show that our approach can be used to perform face recognition using only a single 3D scan.
arXiv Detail & Related papers (2020-11-24T23:08:14Z) - ShapeAssembly: Learning to Generate Programs for 3D Shape Structure
Synthesis [38.27280837835169]
We propose ShapeAssembly, a domain-specific "assembly-language" for 3D shape structures.
We show how to extract ShapeAssembly programs from existing shape structures in the PartNet dataset.
We evaluate our approach by comparing shapes output by our generated programs to those from other recent shape structure models.
arXiv Detail & Related papers (2020-09-17T02:26:45Z) - Canonical 3D Deformer Maps: Unifying parametric and non-parametric
methods for dense weakly-supervised category reconstruction [79.98689027127855]
We propose a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects.
Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings.
It achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.
arXiv Detail & Related papers (2020-08-28T15:44:05Z) - Unsupervised Shape and Pose Disentanglement for 3D Meshes [49.431680543840706]
We present a simple yet effective approach to learn disentangled shape and pose representations in an unsupervised setting.
We use a combination of self-consistency and cross-consistency constraints to learn pose and shape space from registered meshes.
We demonstrate the usefulness of learned representations through a number of tasks including pose transfer and shape retrieval.
arXiv Detail & Related papers (2020-07-22T11:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.