Quo Vadis, Motion Generation? From Large Language Models to Large Motion Models
- URL: http://arxiv.org/abs/2410.03311v1
- Date: Fri, 4 Oct 2024 10:48:54 GMT
- Title: Quo Vadis, Motion Generation? From Large Language Models to Large Motion Models
- Authors: Ye Wang, Sipeng Zheng, Bin Cao, Qianshan Wei, Qin Jin, Zongqing Lu,
- Abstract summary: We present MotionBase, the first million-level motion generation benchmark.
By leveraging this vast dataset, our large motion model demonstrates strong performance across a broad range of motions.
We introduce a novel 2D lookup-free approach for motion tokenization, which preserves motion information and expands codebook capacity.
- Score: 70.78051873517285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by the recent success of LLMs, the field of human motion understanding has increasingly shifted towards the development of large motion models. Despite some progress, current state-of-the-art works remain far from achieving truly generalist models, largely due to the lack of large-scale, high-quality motion data. To address this, we present MotionBase, the first million-level motion generation benchmark, offering 15 times the data volume of the previous largest dataset, and featuring multimodal data with hierarchically detailed text descriptions. By leveraging this vast dataset, our large motion model demonstrates strong performance across a broad range of motions, including unseen ones. Through systematic investigation, we underscore the importance of scaling both data and model size, with synthetic data and pseudo labels playing a crucial role in mitigating data acquisition costs. Moreover, our research reveals the limitations of existing evaluation metrics, particularly in handling out-of-domain text instructions -- an issue that has long been overlooked. In addition to these, we introduce a novel 2D lookup-free approach for motion tokenization, which preserves motion information and expands codebook capacity, further enhancing the representative ability of large motion models. The release of MotionBase and the insights gained from this study are expected to pave the way for the development of more powerful and versatile motion generation models.
Related papers
- IMUDiffusion: A Diffusion Model for Multivariate Time Series Synthetisation for Inertial Motion Capturing Systems [0.0]
We propose IMUDiffusion, a probabilistic diffusion model specifically designed for time series generation.
Our approach enables the generation of high-quality time series sequences which accurately capture the dynamics of human activities.
In some cases, we are able to improve the macro F1-score by almost 30%.
arXiv Detail & Related papers (2024-11-05T09:53:52Z) - Large Motion Model for Unified Multi-Modal Motion Generation [50.56268006354396]
Large Motion Model (LMM) is a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model.
LMM tackles these challenges from three principled aspects.
arXiv Detail & Related papers (2024-04-01T17:55:11Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor.
Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z) - Make-An-Animation: Large-Scale Text-conditional 3D Human Motion
Generation [47.272177594990104]
We introduce Make-An-Animation, a text-conditioned human motion generation model.
It learns more diverse poses and prompts from large-scale image-text datasets.
It reaches state-of-the-art performance on text-to-motion generation.
arXiv Detail & Related papers (2023-05-16T17:58:43Z) - HuMoR: 3D Human Motion Model for Robust Pose Estimation [100.55369985297797]
HuMoR is a 3D Human Motion Model for Robust Estimation of temporal pose and shape.
We introduce a conditional variational autoencoder, which learns a distribution of the change in pose at each step of a motion sequence.
We demonstrate that our model generalizes to diverse motions and body shapes after training on a large motion capture dataset.
arXiv Detail & Related papers (2021-05-10T21:04:55Z) - Recognition and Synthesis of Object Transport Motion [0.0]
This project illustrates how deep convolutional networks can be used, alongside specialized data augmentation techniques, on a small motion capture dataset.
The project shows how these same augmentation techniques can be scaled up for use in the more complex task of motion synthesis.
By exploring recent developments in the concept of Generative Adversarial Models (GANs), specifically the Wasserstein GAN, this project outlines a model that is able to successfully generate lifelike object transportation motions.
arXiv Detail & Related papers (2020-09-27T22:13:26Z) - Dynamic Future Net: Diversified Human Motion Generation [31.987602940970888]
Human motion modelling is crucial in many areas such as computer graphics, vision and virtual reality.
We present Dynamic Future Net, a new deep learning model where we explicitly focuses on the intrinsic motionity of human motion dynamics.
Our model can generate a large number of high-quality motions with arbitrary duration, and visuallyincing variations in both space and time.
arXiv Detail & Related papers (2020-08-25T02:31:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.