InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation
- URL: http://arxiv.org/abs/2407.10061v1
- Date: Sun, 14 Jul 2024 03:12:19 GMT
- Title: InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation
- Authors: Zeyu Zhang, Akide Liu, Qi Chen, Feng Chen, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang,
- Abstract summary: Current methods struggle to handle long motion sequences as a single input due to high computational cost.
We propose InfiniMotion, a method that generates continuous motion sequences of arbitrary length within an autoregressive framework.
We highlight its groundbreaking capability by generating a continuous 1-hour human motion with around 80,000 frames.
- Score: 31.775481455602634
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Text-to-motion generation holds potential for film, gaming, and robotics, yet current methods often prioritize short motion generation, making it challenging to produce long motion sequences effectively: (1) Current methods struggle to handle long motion sequences as a single input due to prohibitively high computational cost; (2) Breaking down the generation of long motion sequences into shorter segments can result in inconsistent transitions and requires interpolation or inpainting, which lacks entire sequence modeling. To solve these challenges, we propose InfiniMotion, a method that generates continuous motion sequences of arbitrary length within an autoregressive framework. We highlight its groundbreaking capability by generating a continuous 1-hour human motion with around 80,000 frames. Specifically, we introduce the Motion Memory Transformer with Bidirectional Mamba Memory, enhancing the transformer's memory to process long motion sequences effectively without overwhelming computational resources. Notably our method achieves over 30% improvement in FID and 6 times longer demonstration compared to previous state-of-the-art methods, showcasing significant advancements in long motion generation. See project webpage: https://steve-zeyu-zhang.github.io/InfiniMotion/
Related papers
- KMM: Key Frame Mask Mamba for Extended Motion Generation [21.144913854895243]
Key frame Masking Modeling is a novel architecture featuring Key frame Masking Modeling to enhance Mamba's focus on key actions in motion segments.
We conduct extensive experiments on the go-to dataset, BABEL, achieving state-of-the-art performance with a reduction of more than 57% in FID and 70% parameters compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2024-11-10T14:41:38Z) - Lagrangian Motion Fields for Long-term Motion Generation [32.548139921363756]
We introduce the concept of Lagrangian Motion Fields, specifically designed for long-term motion generation.
By treating each joint as a Lagrangian particle with uniform velocity over short intervals, our approach condenses motion representations into a series of "supermotions"
Our solution is versatile and lightweight, eliminating the need for neural network preprocessing.
arXiv Detail & Related papers (2024-09-03T01:38:06Z) - Infinite Motion: Extended Motion Generation via Long Text Instructions [51.61117351997808]
"Infinite Motion" is a novel approach that leverages long text to extended motion generation.
Key innovation of our model is its ability to accept arbitrary lengths of text as input.
We incorporate the timestamp design for text which allows precise editing of local segments within the generated sequences.
arXiv Detail & Related papers (2024-07-11T12:33:56Z) - T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences [47.258957770690685]
We introduce T2LM, a continuous long-term generation framework that can be trained without sequential data.
T2LM comprises two components: a 1D-convolutional VQVAE, trained to compress motion to sequences of latent vectors, and a Transformer-based Text that predicts a latent sequence given an input text.
At inference, a sequence of sentences is translated into a continuous stream of latent vectors. This is then decoded into a motion by the VQVAE decoder; the use of 1D convolutions with a local temporal receptive field avoids temporal inconsistencies between training and generated sequences.
arXiv Detail & Related papers (2024-06-02T06:44:35Z) - Motion Mamba: Efficient and Long Sequence Motion Generation [26.777455596989526]
Recent advancements in state space models (SSMs) have showcased considerable promise in long sequence modeling.
We propose Motion Mamba, a simple and efficient approach that presents the pioneering motion generation model utilized SSMs.
Our proposed method achieves up to 50% FID improvement and up to 4 times faster on the HumanML3D and KIT-ML datasets.
arXiv Detail & Related papers (2024-03-12T10:25:29Z) - FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing [56.29102849106382]
FineMoGen is a diffusion-based motion generation and editing framework.
It can synthesize fine-grained motions, with spatial-temporal composition to the user instructions.
FineMoGen further enables zero-shot motion editing capabilities with the aid of modern large language models.
arXiv Detail & Related papers (2023-12-22T16:56:02Z) - Ring Attention with Blockwise Transformers for Near-Infinite Context [88.61687950039662]
We present a novel approach, Ring Attention with Blockwise Transformers (Ring Attention), which leverages blockwise computation of self-attention and feedforward to distribute long sequences across multiple devices.
Our approach enables training and inference of sequences that are up to device count times longer than those achievable by prior memory-efficient Transformers.
arXiv Detail & Related papers (2023-10-03T08:44:50Z) - MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot
Action Recognition [50.345327516891615]
We develop a Motion-augmented Long-short Contrastive Learning (MoLo) method that contains two crucial components, including a long-short contrastive objective and a motion autodecoder.
MoLo can simultaneously learn long-range temporal context and motion cues for comprehensive few-shot matching.
arXiv Detail & Related papers (2023-04-03T13:09:39Z) - MultiAct: Long-Term 3D Human Motion Generation from Multiple Action
Labels [59.53048564128578]
We present MultiAct, the first framework to generate long-term 3D human motion from multiple action labels.
It takes account of both action and motion conditions with a unified recurrent generation system.
As a result, MultiAct produces realistic long-term motion controlled by the given sequence of multiple action labels.
arXiv Detail & Related papers (2022-12-12T13:52:53Z) - Generative Tweening: Long-term Inbetweening of 3D Human Motions [40.16462039509098]
We introduce a biomechanically constrained generative adversarial network that performs long-term inbetweening of human motions.
We trained with 79 classes of captured motion data, our network performs robustly on a variety of highly complex motion styles.
arXiv Detail & Related papers (2020-05-18T17:04:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.