ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis
- URL: http://arxiv.org/abs/2502.18180v2
- Date: Thu, 27 Feb 2025 13:55:34 GMT
- Title: ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis
- Authors: Lei Li, Sen Jia, Jianhao Wang, Zhaochong An, Jiaang Li, Jenq-Neng Hwang, Serge Belongie,
- Abstract summary: ChatMotion is a multimodal multi-agent framework for human motion analysis.<n>It interprets user intent, decomposes complex tasks into meta-tasks, and activates specialized function modules for motion comprehension.<n>It integrates multiple specialized modules, such as the MotionCore, to analyze human motion from various perspectives.
- Score: 37.60532857094311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advancements in Multimodal Large Language Models (MLLMs) have improved human motion understanding. However, these models remain constrained by their "instruct-only" nature, lacking interactivity and adaptability for diverse analytical perspectives. To address these challenges, we introduce ChatMotion, a multimodal multi-agent framework for human motion analysis. ChatMotion dynamically interprets user intent, decomposes complex tasks into meta-tasks, and activates specialized function modules for motion comprehension. It integrates multiple specialized modules, such as the MotionCore, to analyze human motion from various perspectives. Extensive experiments demonstrate ChatMotion's precision, adaptability, and user engagement for human motion understanding.
Related papers
- Mojito: LLM-Aided Motion Instructor with Jitter-Reduced Inertial Tokens [37.26990830273303]
Inertial measurement units (IMUs) offer lightweight, wearable, and privacy-conscious motion sensing.
processing of streaming IMU data faces challenges such as wireless transmission instability, sensor noise, and drift.
We introduce Mojito, an intelligent motion agent that integrates inertial sensing with large language models for interactive motion capture and behavioral analysis.
arXiv Detail & Related papers (2025-02-22T10:31:58Z) - MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding [76.30210465222218]
MotionGPT-2 is a unified Large Motion-Language Model (LMLMLM)
It supports multimodal control conditions through pre-trained Large Language Models (LLMs)
It is highly adaptable to the challenging 3D holistic motion generation task.
arXiv Detail & Related papers (2024-10-29T05:25:34Z) - A Unified Framework for Motion Reasoning and Generation in Human Interaction [28.736843383405603]
We introduce Versatile Interactive Motion-language model, which integrates both language and motion modalities.
VIM is capable of simultaneously understanding and generating both motion and text modalities.
We evaluate VIM across multiple interactive motion-related tasks, including motion-to-text, text-to-motion, reaction generation, motion editing, and reasoning about motion sequences.
arXiv Detail & Related papers (2024-10-08T02:23:53Z) - MotionLLM: Understanding Human Behaviors from Human Motions and Videos [40.132643319573205]
This study delves into the realm of multi-modality (i.e., video and motion modalities) human behavior understanding.
We present MotionLLM, a framework for human motion understanding, captioning, and reasoning.
arXiv Detail & Related papers (2024-05-30T17:59:50Z) - Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs [67.59291068131438]
Motion-Agent is a conversational framework designed for general human motion generation, editing, and understanding.
Motion-Agent employs an open-source pre-trained language model to develop a generative agent, MotionLLM, that bridges the gap between motion and text.
arXiv Detail & Related papers (2024-05-27T09:57:51Z) - MotionChain: Conversational Motion Controllers via Multimodal Prompts [25.181069337771127]
We present MotionChain, a conversational human motion controller to generate continuous and long-term human motion through multimodal prompts.
By leveraging large-scale language, vision-language, and vision-motion data, MotionChain comprehends each instruction in multi-turn conversation and generates human motions followed by these prompts.
arXiv Detail & Related papers (2024-04-02T07:09:29Z) - MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments [82.67236400004826]
We introduce the Multimodal Embodied Interactive Agent (MEIA), capable of translating high-level tasks expressed in natural language into a sequence of executable actions.
MEM module enables MEIA to generate executable action plans based on diverse requirements and the robot's capabilities.
arXiv Detail & Related papers (2024-02-01T02:43:20Z) - MoConVQ: Unified Physics-Based Motion Control via Scalable Discrete
Representations [25.630268570049708]
MoConVQ is a novel unified framework for physics-based motion control leveraging scalable discrete representations.
Our approach effectively learns motion embeddings from a large, unstructured dataset spanning tens of hours of motion examples.
arXiv Detail & Related papers (2023-10-16T09:09:02Z) - DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions.
We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z) - MotionGPT: Human Motion as a Foreign Language [47.21648303282788]
Human motion displays a semantic coupling akin to human language, often perceived as a form of body language.
By fusing language data with large-scale motion models, motion-language pre-training can enhance the performance of motion-related tasks.
We propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks.
arXiv Detail & Related papers (2023-06-26T15:53:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.