MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion
- URL: http://arxiv.org/abs/2409.12140v1
- Date: Wed, 18 Sep 2024 17:03:30 GMT
- Title: MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion
- Authors: Kalakonda Sai Shashank, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla,
- Abstract summary: MoRAG is a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation.
We create diverse samples through the spatial composition of the retrieved motions.
Our framework can serve as a plug-and-play module, improving the performance of motion diffusion models.
- Score: 8.94802080815133
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce MoRAG, a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation. The method enhances motion diffusion models by leveraging additional knowledge obtained through an improved motion retrieval process. By effectively prompting large language models (LLMs), we address spelling errors and rephrasing issues in motion retrieval. Our approach utilizes a multi-part retrieval strategy to improve the generalizability of motion retrieval across the language space. We create diverse samples through the spatial composition of the retrieved motions. Furthermore, by utilizing low-level, part-specific motion information, we can construct motion samples for unseen text descriptions. Our experiments demonstrate that our framework can serve as a plug-and-play module, improving the performance of motion diffusion models. Code, pretrained models and sample videos will be made available at: https://motion-rag.github.io/
Related papers
- Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs [67.59291068131438]
Motion-Agent is a conversational framework designed for general human motion generation, editing, and understanding.
Motion-Agent employs an open-source pre-trained language model to develop a generative agent, MotionLLM, that bridges the gap between motion and text.
arXiv Detail & Related papers (2024-05-27T09:57:51Z) - Motion Flow Matching for Human Motion Synthesis and Editing [75.13665467944314]
We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications.
Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
arXiv Detail & Related papers (2023-12-14T12:57:35Z) - Real-time Animation Generation and Control on Rigged Models via Large
Language Models [50.034712575541434]
We introduce a novel method for real-time animation control and generation on rigged models using natural language input.
We embed a large language model (LLM) in Unity to output structured texts that can be parsed into diverse and realistic animations.
arXiv Detail & Related papers (2023-10-27T01:36:35Z) - ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model [33.64263969970544]
3D human motion generation is crucial for creative industry.
Recent advances rely on generative models with domain knowledge for text-driven motion generation.
We propose ReMoDiffuse, a diffusion-model-based motion generation framework.
arXiv Detail & Related papers (2023-04-03T16:29:00Z) - MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis.
We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework.
We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z) - FLAME: Free-form Language-based Motion Synthesis & Editing [17.70085940884357]
We propose a diffusion-based motion synthesis and editing model named FLAME.
FLAME can generate high-fidelity motions well aligned with the given text.
It can edit the parts of the motion, both frame-wise and joint-wise, without any fine-tuning.
arXiv Detail & Related papers (2022-09-01T10:34:57Z) - MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework.
It excels at modeling complicated data distribution and generating vivid motion sequences.
It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z) - TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions.
We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data.
We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z) - Modeling Motion with Multi-Modal Features for Text-Based Video
Segmentation [56.41614987789537]
Text-based video segmentation aims to segment the target object in a video based on a describing sentence.
We propose a method to fuse and align appearance, motion, and linguistic features to achieve accurate segmentation.
arXiv Detail & Related papers (2022-04-06T02:42:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.