TextDeformer: Geometry Manipulation using Text Guidance
- URL: http://arxiv.org/abs/2304.13348v1
- Date: Wed, 26 Apr 2023 07:38:41 GMT
- Title: TextDeformer: Geometry Manipulation using Text Guidance
- Authors: William Gao, Noam Aigerman, Thibault Groueix, Vladimir G. Kim, Rana
Hanocka
- Abstract summary: We present a technique for producing a deformation of an input triangle mesh guided solely by a text prompt.
Our framework relies on differentiable rendering to connect geometry to powerful pre-trained image encoders, such as CLIP and DINO.
To overcome this limitation, we opt to represent our mesh deformation through Jacobians, which updates deformations in a global, smooth manner.
- Score: 37.02412892926677
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a technique for automatically producing a deformation of an input
triangle mesh, guided solely by a text prompt. Our framework is capable of
deformations that produce both large, low-frequency shape changes, and small
high-frequency details. Our framework relies on differentiable rendering to
connect geometry to powerful pre-trained image encoders, such as CLIP and DINO.
Notably, updating mesh geometry by taking gradient steps through differentiable
rendering is notoriously challenging, commonly resulting in deformed meshes
with significant artifacts. These difficulties are amplified by noisy and
inconsistent gradients from CLIP. To overcome this limitation, we opt to
represent our mesh deformation through Jacobians, which updates deformations in
a global, smooth manner (rather than locally-sub-optimal steps). Our key
observation is that Jacobians are a representation that favors smoother, large
deformations, leading to a global relation between vertices and pixels, and
avoiding localized noisy gradients. Additionally, to ensure the resulting shape
is coherent from all 3D viewpoints, we encourage the deep features computed on
the 2D encoding of the rendering to be consistent for a given vertex from all
viewpoints. We demonstrate that our method is capable of smoothly-deforming a
wide variety of source mesh and target text prompts, achieving both large
modifications to, e.g., body proportions of animals, as well as adding fine
semantic details, such as shoe laces on an army boot and fine details of a
face.
Related papers
- ShapeFusion: A 3D diffusion model for localized shape editing [37.82690898932135]
We propose an effective diffusion masking training strategy that, by design, facilitates localized manipulation of any shape region.
Compared to the current state-of-the-art our method leads to more interpretable shape manipulations than methods relying on latent code state.
arXiv Detail & Related papers (2024-03-28T18:50:19Z) - T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image [84.08705684778666]
We propose a novel Transformer-boosted architecture, named T-Pixel2Mesh, inspired by the coarse-to-fine approach of P2M.
Specifically, we use a global Transformer to control the holistic shape and a local Transformer to refine the local geometry details.
Our experiments on ShapeNet demonstrate state-of-the-art performance, while results on real-world data show the generalization capability.
arXiv Detail & Related papers (2024-03-20T15:14:22Z) - HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation [17.590555698266346]
We present HeadEvolver, a novel framework to generate stylized head avatars from text guidance.
HeadEvolver uses locally learnable mesh deformation from a template head mesh, producing high-quality digital assets for detail-preserving editing and animation.
arXiv Detail & Related papers (2024-03-14T12:15:23Z) - Robust 3D Tracking with Quality-Aware Shape Completion [67.9748164949519]
We propose a synthetic target representation composed of dense and complete point clouds depicting the target shape precisely by shape completion for robust 3D tracking.
Specifically, we design a voxelized 3D tracking framework with shape completion, in which we propose a quality-aware shape completion mechanism to alleviate the adverse effect of noisy historical predictions.
arXiv Detail & Related papers (2023-12-17T04:50:24Z) - Deformation-Guided Unsupervised Non-Rigid Shape Matching [7.327850781641328]
We present an unsupervised data-driven approach for non-rigid shape matching.
Our approach is particularly robust when matching digitized shapes using 3D scanners.
arXiv Detail & Related papers (2023-11-27T09:55:55Z) - DragD3D: Realistic Mesh Editing with Rigidity Control Driven by 2D Diffusion Priors [10.355568895429588]
Direct mesh editing and deformation are key components in the geometric modeling and animation pipeline.
Regularizers are not aware of the global context and semantics of the object.
We show that our deformations can be controlled to yield realistic shape deformations aware of the global context.
arXiv Detail & Related papers (2023-10-06T19:55:40Z) - 3Deformer: A Common Framework for Image-Guided Mesh Deformation [27.732389685912214]
Given a source 3D mesh with semantic materials, and a user-specified semantic image, 3Deformer can accurately edit the source mesh.
Our 3Deformer is able to produce impressive results and reaches the state-of-the-art level.
arXiv Detail & Related papers (2023-07-19T10:44:44Z) - Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion [115.82306502822412]
StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing.
A corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing.
We study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures.
arXiv Detail & Related papers (2022-12-14T18:49:50Z) - Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape
Laplacian [58.704089101826774]
We present a 3D-aware image deformation method with minimal restrictions on shape category and deformation type.
We take a supervised learning-based approach to predict the shape Laplacian of the underlying volume of a 3D reconstruction represented as a point cloud.
In the experiments, we present our results of deforming 2D character and clothed human images.
arXiv Detail & Related papers (2022-03-29T04:57:18Z) - Learning Skeletal Articulations with Neural Blend Shapes [57.879030623284216]
We develop a neural technique for articulating 3D characters using enveloping with a pre-defined skeletal structure.
Our framework learns to rig and skin characters with the same articulation structure.
We propose neural blend shapes which improve the deformation quality in the joint regions.
arXiv Detail & Related papers (2021-05-06T05:58:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.