Synthesis of Compositional Animations from Textual Descriptions
- URL: http://arxiv.org/abs/2103.14675v1
- Date: Fri, 26 Mar 2021 18:23:29 GMT
- Title: Synthesis of Compositional Animations from Textual Descriptions
- Authors: Anindita Ghosh, Noshaba Cheema, Cennet Oguz, Christian Theobalt,
Philipp Slusallek
- Abstract summary: "How unstructured and complex can we make a sentence and still generate plausible movements from it?"
"How can we animate 3D-characters from a movie script or move robots by simply telling them what we would like them to do?"
- Score: 54.85920052559239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: "How can we animate 3D-characters from a movie script or move robots by
simply telling them what we would like them to do?" "How unstructured and
complex can we make a sentence and still generate plausible movements from it?"
These are questions that need to be answered in the long-run, as the field is
still in its infancy. Inspired by these problems, we present a new technique
for generating compositional actions, which handles complex input sentences.
Our output is a 3D pose sequence depicting the actions in the input sentence.
We propose a hierarchical two-stream sequential model to explore a finer
joint-level mapping between natural language sentences and 3D pose sequences
corresponding to the given motion. We learn two manifold representations of the
motion -- one each for the upper body and the lower body movements. Our model
can generate plausible pose sequences for short sentences describing single
actions as well as long compositional sentences describing multiple sequential
and superimposed actions. We evaluate our proposed model on the publicly
available KIT Motion-Language Dataset containing 3D pose data with
human-annotated sentences. Experimental results show that our model advances
the state-of-the-art on text-based motion synthesis in objective evaluations by
a margin of 50%. Qualitative evaluations based on a user study indicate that
our synthesized motions are perceived to be the closest to the ground-truth
motion captures for both short and compositional sentences.
Related papers
- Motion Generation from Fine-grained Textual Descriptions [29.033358642532722]
We build a large-scale language-motion dataset specializing in fine-grained textual descriptions, FineHumanML3D.
We design a new text2motion model, FineMotionDiffuse, making full use of fine-grained textual information.
Our evaluation shows that FineMotionDiffuse trained on FineHumanML3D improves FID by a large margin of 0.38, compared with competitive baselines.
arXiv Detail & Related papers (2024-03-20T11:38:30Z) - Story-to-Motion: Synthesizing Infinite and Controllable Character
Animation from Long Text [14.473103773197838]
A new task, Story-to-Motion, arises when characters are required to perform specific motions based on a long text description.
Previous works in character control and text-to-motion have addressed related aspects, yet a comprehensive solution remains elusive.
We propose a novel system that generates controllable, infinitely long motions and trajectories aligned with the input text.
arXiv Detail & Related papers (2023-11-13T16:22:38Z) - SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation [58.25766404147109]
Our goal is to synthesize 3D human motions given textual inputs describing simultaneous actions.
We refer to generating such simultaneous movements as performing'spatial compositions'
arXiv Detail & Related papers (2023-04-20T16:01:55Z) - IMoS: Intent-Driven Full-Body Motion Synthesis for Human-Object
Interactions [69.95820880360345]
We present the first framework to synthesize the full-body motion of virtual human characters with 3D objects placed within their reach.
Our system takes as input textual instructions specifying the objects and the associated intentions of the virtual characters.
We show that our synthesized full-body motions appear more realistic to the participants in more than 80% of scenarios.
arXiv Detail & Related papers (2022-12-14T23:59:24Z) - Generating Holistic 3D Human Motion from Speech [97.11392166257791]
We build a high-quality dataset of 3D holistic body meshes with synchronous speech.
We then define a novel speech-to-motion generation framework in which the face, body, and hands are modeled separately.
arXiv Detail & Related papers (2022-12-08T17:25:19Z) - TEACH: Temporal Action Composition for 3D Humans [50.97135662063117]
Given a series of natural language descriptions, our task is to generate 3D human motions that correspond semantically to the text.
In particular, our goal is to enable the synthesis of a series of actions, which we refer to as temporal action composition.
arXiv Detail & Related papers (2022-09-09T00:33:40Z) - Learning Speech-driven 3D Conversational Gestures from Video [106.15628979352738]
We propose the first approach to automatically and jointly synthesize both the synchronous 3D conversational body and hand gestures.
Our algorithm uses a CNN architecture that leverages the inherent correlation between facial expression and hand gestures.
We also contribute a new way to create a large corpus of more than 33 hours of annotated body, hand, and face data from in-the-wild videos of talking people.
arXiv Detail & Related papers (2021-02-13T01:05:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.