Action-GPT: Leveraging Large-scale Language Models for Improved and
Generalized Zero Shot Action Generation
- URL: http://arxiv.org/abs/2211.15603v2
- Date: Wed, 30 Nov 2022 13:13:29 GMT
- Title: Action-GPT: Leveraging Large-scale Language Models for Improved and
Generalized Zero Shot Action Generation
- Authors: Sai Shashank Kalakonda, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla
- Abstract summary: Action-GPT is a framework for incorporating Large Language Models into text-based action generation models.
We show that utilizing detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces.
- Score: 8.753131760384964
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Action-GPT, a plug and play framework for incorporating Large
Language Models (LLMs) into text-based action generation models. Action phrases
in current motion capture datasets contain minimal and to-the-point
information. By carefully crafting prompts for LLMs, we generate richer and
fine-grained descriptions of the action. We show that utilizing these detailed
descriptions instead of the original action phrases leads to better alignment
of text and motion spaces. Our experiments show qualitative and quantitative
improvement in the quality of synthesized motions produced by recent
text-to-motion models. Code, pretrained models and sample videos will be made
available at https://actiongpt.github.io
Related papers
- Mimir: Improving Video Diffusion Models for Precise Text Understanding [53.72393225042688]
Text serves as the key control signal in video generation due to its narrative nature.
The recent success of large language models (LLMs) showcases the power of decoder-only transformers.
This work addresses this challenge with Mimir, an end-to-end training framework featuring a carefully tailored token fuser.
arXiv Detail & Related papers (2024-12-04T07:26:44Z) - MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models [59.10171699717122]
MoTrans is a customized motion transfer method enabling video generation of similar motion in new context.
multimodal representations from recaptioned prompt and video frames promote the modeling of appearance.
Our method effectively learns specific motion pattern from singular or multiple reference videos.
arXiv Detail & Related papers (2024-12-02T10:07:59Z) - MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion [8.94802080815133]
MoRAG is a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation.
We create diverse samples through the spatial composition of the retrieved motions.
Our framework can serve as a plug-and-play module, improving the performance of motion diffusion models.
arXiv Detail & Related papers (2024-09-18T17:03:30Z) - Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs [67.59291068131438]
Motion-Agent is a conversational framework designed for general human motion generation, editing, and understanding.
Motion-Agent employs an open-source pre-trained language model to develop a generative agent, MotionLLM, that bridges the gap between motion and text.
arXiv Detail & Related papers (2024-05-27T09:57:51Z) - Aligning Actions and Walking to LLM-Generated Textual Descriptions [3.1049440318608568]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various domains.
This work explores the use of LLMs to generate rich textual descriptions for motion sequences, encompassing both actions and walking patterns.
arXiv Detail & Related papers (2024-04-18T13:56:03Z) - CoMo: Controllable Motion Generation through Language Guided Pose Code Editing [57.882299081820626]
We introduce CoMo, a Controllable Motion generation model, adept at accurately generating and editing motions.
CoMo decomposes motions into discrete and semantically meaningful pose codes.
It autoregressively generates sequences of pose codes, which are then decoded into 3D motions.
arXiv Detail & Related papers (2024-03-20T18:11:10Z) - Motion Generation from Fine-grained Textual Descriptions [29.033358642532722]
We build a large-scale language-motion dataset specializing in fine-grained textual descriptions, FineHumanML3D.
We design a new text2motion model, FineMotionDiffuse, making full use of fine-grained textual information.
Our evaluation shows that FineMotionDiffuse trained on FineHumanML3D improves FID by a large margin of 0.38, compared with competitive baselines.
arXiv Detail & Related papers (2024-03-20T11:38:30Z) - OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers [45.808597624491156]
We present OMG, a novel framework, which enables compelling motion generation from zero-shot open-vocabulary text prompts.
At the pre-training stage, our model improves the generation ability by learning the rich out-of-domain inherent motion traits.
At the fine-tuning stage, we introduce motion ControlNet, which incorporates text prompts as conditioning information.
arXiv Detail & Related papers (2023-12-14T14:31:40Z) - Real-time Animation Generation and Control on Rigged Models via Large
Language Models [50.034712575541434]
We introduce a novel method for real-time animation control and generation on rigged models using natural language input.
We embed a large language model (LLM) in Unity to output structured texts that can be parsed into diverse and realistic animations.
arXiv Detail & Related papers (2023-10-27T01:36:35Z) - Compositional Video Synthesis with Action Graphs [112.94651460161992]
Videos of actions are complex signals containing rich compositional structure in space and time.
We propose to represent the actions in a graph structure called Action Graph and present the new Action Graph To Video'' synthesis task.
Our generative model for this task (AG2Vid) disentangles motion and appearance features, and by incorporating a scheduling mechanism for actions facilitates a timely and coordinated video generation.
arXiv Detail & Related papers (2020-06-27T09:39:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.