CLIP-Actor: Text-Driven Recommendation and Stylization for Animating
Human Meshes
- URL: http://arxiv.org/abs/2206.04382v1
- Date: Thu, 9 Jun 2022 09:50:39 GMT
- Title: CLIP-Actor: Text-Driven Recommendation and Stylization for Animating
Human Meshes
- Authors: Kim Youwang, Kim Ji-Yeon, Tae-Hyun Oh
- Abstract summary: We propose CLIP-Actor, a text-driven motion recommendation and neural mesh stylization system for human mesh animation.
It animates a 3D human mesh to conform to a text prompt by recommending a motion sequence and learning mesh style attributes.
We demonstrate that CLIP-Actor produces plausible and human-recognizable style 3D human mesh in motion with detailed geometry and texture from a natural language prompt.
- Score: 17.22112222736234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose CLIP-Actor, a text-driven motion recommendation and neural mesh
stylization system for human mesh animation. CLIP-Actor animates a 3D human
mesh to conform to a text prompt by recommending a motion sequence and learning
mesh style attributes. Prior work fails to generate plausible results when the
artist-designed mesh content does not conform to the text from the beginning.
Instead, we build a text-driven human motion recommendation system by
leveraging a large-scale human motion dataset with language labels. Given a
natural language prompt, CLIP-Actor first suggests a human motion that conforms
to the prompt in a coarse-to-fine manner. Then, we propose a
synthesize-through-optimization method that detailizes and texturizes a
recommended mesh sequence in a disentangled way from the pose of each frame. It
allows the style attribute to conform to the prompt in a temporally-consistent
and pose-agnostic manner. The decoupled neural optimization also enables
spatio-temporal view augmentation from multi-frame human motion. We further
propose the mask-weighted embedding attention, which stabilizes the
optimization process by rejecting distracting renders containing scarce
foreground pixels. We demonstrate that CLIP-Actor produces plausible and
human-recognizable style 3D human mesh in motion with detailed geometry and
texture from a natural language prompt.
Related papers
- Dynamic Typography: Bringing Text to Life via Video Diffusion Prior [73.72522617586593]
We present an automated text animation scheme, termed "Dynamic Typography"
It deforms letters to convey semantic meaning and infuses them with vibrant movements based on user prompts.
Our technique harnesses vector graphics representations and an end-to-end optimization-based framework.
arXiv Detail & Related papers (2024-04-17T17:59:55Z) - 3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization [1.3654846342364306]
3D stylization, the application of specific styles to three-dimensional objects, offers substantial commercial potential.
Recent advancements in artificial intelligence and text-driven manipulation methods have made the stylization process increasingly intuitive and automated.
We introduce 3DStyleGLIP, a novel framework specifically designed for text-driven, part-tailored 3D stylization.
arXiv Detail & Related papers (2024-04-03T10:44:06Z) - Disentangled Clothed Avatar Generation from Text Descriptions [41.01453534915251]
We introduce a novel text-to-avatar generation method that separately generates the human body and the clothes.
Our approach achieves higher texture and geometry quality and better semantic alignment with text prompts.
arXiv Detail & Related papers (2023-12-08T18:43:12Z) - ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment [5.516575655881858]
We introduce a technique that enables the control of arbitrary styles by leveraging natural language as emotion prompts.
Our method accomplishes expressive facial animation generation and offers enhanced flexibility in effectively conveying the desired style.
arXiv Detail & Related papers (2023-08-28T09:35:13Z) - TADA! Text to Animatable Digital Avatars [57.52707683788961]
TADA takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures.
We derive an optimizable high-resolution body model from SMPL-X with 3D displacements and a texture map.
We render normals and RGB images of the generated character and exploit their latent embeddings in the SDS training process.
arXiv Detail & Related papers (2023-08-21T17:59:10Z) - Unsupervised Learning of Style-Aware Facial Animation from Real Acting
Performances [3.95944314850151]
We present a novel approach for text/speech-driven animation of a photo-realistic head model based on blend-shape geometry, dynamic textures, and neural rendering.
Our animation method is based on a conditional CNN that transforms text or speech into a sequence of animation parameters.
For realistic real-time rendering, we train a U-Net that refines pixelization-based renderings by computing improved colors and a foreground matte.
arXiv Detail & Related papers (2023-06-16T17:58:04Z) - ATT3D: Amortized Text-to-3D Object Synthesis [78.96673650638365]
We amortize optimization over text prompts by training on many prompts simultaneously with a unified model, instead of separately.
Our framework - Amortized text-to-3D (ATT3D) - enables knowledge-sharing between prompts to generalize to unseen setups and smooths between text for novel assets and simple animations.
arXiv Detail & Related papers (2023-06-06T17:59:10Z) - Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation
with Wordless Training [178.09150600453205]
In this paper, we investigate offline open-vocabulary text-to-motion generation in a zero-shot learning manner.
Inspired by the prompt learning in NLP, we pretrain a motion generator that learns to reconstruct the full motion from the masked motion.
Our method reformulates the input text into a masked motion as the prompt for the motion generator to reconstruct'' the motion.
arXiv Detail & Related papers (2022-10-28T06:20:55Z) - Language-Guided Face Animation by Recurrent StyleGAN-based Generator [87.56260982475564]
We study a novel task, language-guided face animation, that aims to animate a static face image with the help of languages.
We propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames.
arXiv Detail & Related papers (2022-08-11T02:57:30Z) - TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions.
We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data.
We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z) - Text2Mesh: Text-Driven Neural Stylization for Meshes [18.435567297462416]
Our framework, Text2Mesh, stylizes a 3D mesh by predicting color and local geometric details which conform to a target text prompt.
We consider a disentangled representation of a 3D object using a fixed mesh input (content) coupled with a learned neural network, which we term neural style field network.
In order to modify style, we obtain a similarity score between a text prompt (describing style) and a stylized mesh by harnessing the representational power of CLIP.
arXiv Detail & Related papers (2021-12-06T18:23:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.