Related papers: OmniMotionGPT: Animal Motion Generation with Limited Data

OmniMotionGPT: Animal Motion Generation with Limited Data

URL: http://arxiv.org/abs/2311.18303v1
Date: Thu, 30 Nov 2023 07:14:00 GMT
Title: OmniMotionGPT: Animal Motion Generation with Limited Data
Authors: Zhangsihao Yang, Mingyuan Zhou, Mengyi Shan, Bingbing Wen, Ziwei Xuan, Mitch Hill, Junjie Bai, Guo-Jun Qi, Yalin Wang
Abstract summary: We introduce AnimalML3D, the first text-animal motion dataset with 1240 animation sequences spanning 36 different animal identities. We are able to generate animal motions with high diversity and fidelity, quantitatively and qualitatively outperforming the results of training human motion generation baselines on animal data.
Score: 70.35662376853163
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Our paper aims to generate diverse and realistic animal motion sequences from textual descriptions, without a large-scale animal text-motion dataset. While the task of text-driven human motion synthesis is already extensively studied and benchmarked, it remains challenging to transfer this success to other skeleton structures with limited data. In this work, we design a model architecture that imitates Generative Pretraining Transformer (GPT), utilizing prior knowledge learned from human data to the animal domain. We jointly train motion autoencoders for both animal and human motions and at the same time optimize through the similarity scores among human motion encoding, animal motion encoding, and text CLIP embedding. Presenting the first solution to this problem, we are able to generate animal motions with high diversity and fidelity, quantitatively and qualitatively outperforming the results of training human motion generation baselines on animal data. Additionally, we introduce AnimalML3D, the first text-animal motion dataset with 1240 animation sequences spanning 36 different animal identities. We hope this dataset would mediate the data scarcity problem in text-driven animal motion generation, providing a new playground for the research community.

Related papers

Motion-2-to-3: Leveraging 2D Motion Data to Boost 3D Motion Generation [43.915871360698546]
2D human videos offer a vast and accessible source of motion data, covering a wider range of styles and activities. We introduce a novel framework that disentangles local joint motion from global movements, enabling efficient learning of local motion priors from 2D data. Our method efficiently utilizes 2D data, supporting realistic 3D human motion generation and broadening the range of motion types it supports.
arXiv Detail & Related papers (2024-12-17T17:34:52Z)
T2M-X: Learning Expressive Text-to-Motion Generation from Partially Annotated Data [6.6240820702899565]
Existing methods only generate body motion data, excluding facial expressions and hand movements. Recent attempts to create such a dataset have resulted in either motion inconsistency among different body parts. We propose T2M-X, a two-stage method that learns expressive text-to-motion generation from partially annotated data.
arXiv Detail & Related papers (2024-09-20T06:20:00Z)
MotionFix: Text-Driven 3D Human Motion Editing [52.11745508960547]
Key challenges include the scarcity of training data and the need to design a model that accurately edits the source motion. We propose a methodology to semi-automatically collect a dataset of triplets comprising (i) a source motion, (ii) a target motion, and (iii) an edit text. Access to this data allows us to train a conditional diffusion model, TMED, that takes both the source motion and the edit text as input.
arXiv Detail & Related papers (2024-08-01T16:58:50Z)
Contact-aware Human Motion Generation from Textual Descriptions [57.871692507044344]
This paper addresses the problem of generating 3D interactive human motion from text. We create a novel dataset named RICH-CAT, representing "Contact-Aware Texts" We propose a novel approach named CATMO for text-driven interactive human motion synthesis.
arXiv Detail & Related papers (2024-03-23T04:08:39Z)
Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation [47.272177594990104]
We introduce Make-An-Animation, a text-conditioned human motion generation model. It learns more diverse poses and prompts from large-scale image-text datasets. It reaches state-of-the-art performance on text-to-motion generation.
arXiv Detail & Related papers (2023-05-16T17:58:43Z)
TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions. We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data. We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z)
Render In-between: Motion Guided Video Synthesis for Action Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance. A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset. Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z)
SyDog: A Synthetic Dog Dataset for Improved 2D Pose Estimation [3.411873646414169]
SyDog is a synthetic dataset of dogs containing ground truth pose and bounding box coordinates. We demonstrate that pose estimation models trained on SyDog achieve better performance than models trained purely on real data.
arXiv Detail & Related papers (2021-07-31T14:34:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.