FLAME: Free-form Language-based Motion Synthesis & Editing
- URL: http://arxiv.org/abs/2209.00349v1
- Date: Thu, 1 Sep 2022 10:34:57 GMT
- Title: FLAME: Free-form Language-based Motion Synthesis & Editing
- Authors: Jihoon Kim, Jiseob Kim, Sungjoon Choi
- Abstract summary: We propose a diffusion-based motion synthesis and editing model named FLAME.
FLAME can generate high-fidelity motions well aligned with the given text.
It can edit the parts of the motion, both frame-wise and joint-wise, without any fine-tuning.
- Score: 17.70085940884357
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Text-based motion generation models are drawing a surge of interest for their
potential for automating the motion-making process in the game, animation, or
robot industries. In this paper, we propose a diffusion-based motion synthesis
and editing model named FLAME. Inspired by the recent successes in diffusion
models, we integrate diffusion-based generative models into the motion domain.
FLAME can generate high-fidelity motions well aligned with the given text.
Also, it can edit the parts of the motion, both frame-wise and joint-wise,
without any fine-tuning. FLAME involves a new transformer-based architecture we
devise to better handle motion data, which is found to be crucial to manage
variable-length motions and well attend to free-form text. In experiments, we
show that FLAME achieves state-of-the-art generation performances on three
text-motion datasets: HumanML3D, BABEL, and KIT. We also demonstrate that
editing capability of FLAME can be extended to other tasks such as motion
prediction or motion in-betweening, which have been previously covered by
dedicated models.
Related papers
- MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion [8.94802080815133]
MoRAG is a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation.
We create diverse samples through the spatial composition of the retrieved motions.
Our framework can serve as a plug-and-play module, improving the performance of motion diffusion models.
arXiv Detail & Related papers (2024-09-18T17:03:30Z) - MotionFix: Text-Driven 3D Human Motion Editing [52.11745508960547]
Key challenges include the scarcity of training data and the need to design a model that accurately edits the source motion.
We propose a methodology to semi-automatically collect a dataset of triplets comprising (i) a source motion, (ii) a target motion, and (iii) an edit text.
Access to this data allows us to train a conditional diffusion model, TMED, that takes both the source motion and the edit text as input.
arXiv Detail & Related papers (2024-08-01T16:58:50Z) - AMD:Anatomical Motion Diffusion with Interpretable Motion Decomposition
and Fusion [11.689663297469945]
We propose the Adaptable Motion Diffusion model.
It exploits a Large Language Model (LLM) to parse the input text into a sequence of concise and interpretable anatomical scripts.
We then devise a two-branch fusion scheme that balances the influence of the input text and the anatomical scripts on the inverse diffusion process.
arXiv Detail & Related papers (2023-12-20T04:49:45Z) - Motion Flow Matching for Human Motion Synthesis and Editing [75.13665467944314]
We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications.
Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
arXiv Detail & Related papers (2023-12-14T12:57:35Z) - MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method.
MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model.
During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z) - TapMo: Shape-aware Motion Generation of Skeleton-free Characters [64.83230289993145]
We present TapMo, a Text-driven Animation Pipeline for Motion in a broad spectrum of skeleton-free 3D characters.
TapMo comprises two main components - Mesh Handle Predictor and Shape-aware Diffusion Module.
arXiv Detail & Related papers (2023-10-19T12:14:32Z) - MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis.
We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework.
We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z) - MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework.
It excels at modeling complicated data distribution and generating vivid motion sequences.
It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z) - TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions.
We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data.
We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.