TEACH: Temporal Action Composition for 3D Humans
- URL: http://arxiv.org/abs/2209.04066v2
- Date: Mon, 12 Sep 2022 16:34:20 GMT
- Title: TEACH: Temporal Action Composition for 3D Humans
- Authors: Nikos Athanasiou, Mathis Petrovich, Michael J. Black, G\"ul Varol
- Abstract summary: Given a series of natural language descriptions, our task is to generate 3D human motions that correspond semantically to the text.
In particular, our goal is to enable the synthesis of a series of actions, which we refer to as temporal action composition.
- Score: 50.97135662063117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a series of natural language descriptions, our task is to generate 3D
human motions that correspond semantically to the text, and follow the temporal
order of the instructions. In particular, our goal is to enable the synthesis
of a series of actions, which we refer to as temporal action composition. The
current state of the art in text-conditioned motion synthesis only takes a
single action or a single sentence as input. This is partially due to lack of
suitable training data containing action sequences, but also due to the
computational complexity of their non-autoregressive model formulation, which
does not scale well to long sequences. In this work, we address both issues.
First, we exploit the recent BABEL motion-text collection, which has a wide
range of labeled actions, many of which occur in a sequence with transitions
between them. Next, we design a Transformer-based approach that operates
non-autoregressively within an action, but autoregressively within the sequence
of actions. This hierarchical formulation proves effective in our experiments
when compared with multiple baselines. Our approach, called TEACH for "TEmporal
Action Compositions for Human motions", produces realistic human motions for a
wide variety of actions and temporal compositions from language descriptions.
To encourage work on this new task, we make our code available for research
purposes at our $\href{teach.is.tue.mpg.de}{\text{website}}$.
Related papers
- Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation [71.08922726494842]
We introduce the problem of timeline control for text-driven motion synthesis.
Instead of a single prompt, users can specify a multi-track timeline of multiple prompts organized in temporal intervals that may overlap.
We propose a new test-time denoising method to generate composite animations from a multi-track timeline.
arXiv Detail & Related papers (2024-01-16T18:39:15Z) - Act As You Wish: Fine-Grained Control of Motion Diffusion Model with
Hierarchical Semantic Graphs [31.244039305932287]
We propose hierarchical semantic graphs for fine-grained control over motion generation.
We disentangle motion descriptions into hierarchical semantic graphs including three levels of motions, actions, and specifics.
Our method can continuously refine the generated motion, which may have a far-reaching impact on the community.
arXiv Detail & Related papers (2023-11-02T06:20:23Z) - Synthesizing Long-Term Human Motions with Diffusion Models via Coherent
Sampling [74.62570964142063]
Text-to-motion generation has gained increasing attention, but most existing methods are limited to generating short-term motions.
We propose a novel approach that utilizes a past-conditioned diffusion model with two optional coherent sampling methods.
Our proposed method is capable of generating compositional and coherent long-term 3D human motions controlled by a user-instructed long text stream.
arXiv Detail & Related papers (2023-08-03T16:18:32Z) - SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation [58.25766404147109]
Our goal is to synthesize 3D human motions given textual inputs describing simultaneous actions.
We refer to generating such simultaneous movements as performing'spatial compositions'
arXiv Detail & Related papers (2023-04-20T16:01:55Z) - Weakly-supervised Action Transition Learning for Stochastic Human Motion
Prediction [81.94175022575966]
We introduce the task of action-driven human motion prediction.
It aims to predict multiple plausible future motions given a sequence of action labels and a short motion history.
arXiv Detail & Related papers (2022-05-31T08:38:07Z) - Synthesis of Compositional Animations from Textual Descriptions [54.85920052559239]
"How unstructured and complex can we make a sentence and still generate plausible movements from it?"
"How can we animate 3D-characters from a movie script or move robots by simply telling them what we would like them to do?"
arXiv Detail & Related papers (2021-03-26T18:23:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.