Mojito: LLM-Aided Motion Instructor with Jitter-Reduced Inertial Tokens
- URL: http://arxiv.org/abs/2502.16175v1
- Date: Sat, 22 Feb 2025 10:31:58 GMT
- Title: Mojito: LLM-Aided Motion Instructor with Jitter-Reduced Inertial Tokens
- Authors: Ziwei Shan, Yaoyu He, Chengfeng Zhao, Jiashen Du, Jingyan Zhang, Qixuan Zhang, Jingyi Yu, Lan Xu,
- Abstract summary: Inertial measurement units (IMUs) offer lightweight, wearable, and privacy-conscious motion sensing.<n> processing of streaming IMU data faces challenges such as wireless transmission instability, sensor noise, and drift.<n>We introduce Mojito, an intelligent motion agent that integrates inertial sensing with large language models for interactive motion capture and behavioral analysis.
- Score: 37.26990830273303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human bodily movements convey critical insights into action intentions and cognitive processes, yet existing multimodal systems primarily focused on understanding human motion via language, vision, and audio, which struggle to capture the dynamic forces and torques inherent in 3D motion. Inertial measurement units (IMUs) present a promising alternative, offering lightweight, wearable, and privacy-conscious motion sensing. However, processing of streaming IMU data faces challenges such as wireless transmission instability, sensor noise, and drift, limiting their utility for long-term real-time motion capture (MoCap), and more importantly, online motion analysis. To address these challenges, we introduce Mojito, an intelligent motion agent that integrates inertial sensing with large language models (LLMs) for interactive motion capture and behavioral analysis.
Related papers
- Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input [62.51283548975632]
This work focuses on tracking and understanding human motion using consumer wearable devices, such as VR/AR headsets, smart glasses, cellphones, and smartwatches.
We present Ego4o (o for omni), a new framework for simultaneous human motion capture and understanding from multi-modal egocentric inputs.
arXiv Detail & Related papers (2025-04-11T11:18:57Z) - ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis [37.60532857094311]
ChatMotion is a multimodal multi-agent framework for human motion analysis.
It interprets user intent, decomposes complex tasks into meta-tasks, and activates specialized function modules for motion comprehension.
It integrates multiple specialized modules, such as the MotionCore, to analyze human motion from various perspectives.
arXiv Detail & Related papers (2025-02-25T13:12:55Z) - A Plug-and-Play Physical Motion Restoration Approach for In-the-Wild High-Difficulty Motions [56.709280823844374]
We introduce a mask-based motion correction module (MCM) that leverages motion context and video mask to repair flawed motions.<n>We also propose a physics-based motion transfer module (PTM), which employs a pretrain and adapt approach for motion imitation.<n>Our approach is designed as a plug-and-play module to physically refine the video motion capture results, including high-difficulty in-the-wild motions.
arXiv Detail & Related papers (2024-12-23T08:26:00Z) - MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding [76.30210465222218]
MotionGPT-2 is a unified Large Motion-Language Model (LMLMLM)
It supports multimodal control conditions through pre-trained Large Language Models (LLMs)
It is highly adaptable to the challenging 3D holistic motion generation task.
arXiv Detail & Related papers (2024-10-29T05:25:34Z) - MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations [85.85596165472663]
We build MotionBank, which comprises 13 video action datasets, 1.24M motion sequences, and 132.9M frames of natural and diverse human motions.
Our MotionBank is beneficial for general motion-related tasks of human motion generation, motion in-context generation, and motion understanding.
arXiv Detail & Related papers (2024-10-17T17:31:24Z) - MotionLLM: Understanding Human Behaviors from Human Motions and Videos [40.132643319573205]
This study delves into the realm of multi-modality (i.e., video and motion modalities) human behavior understanding.
We present MotionLLM, a framework for human motion understanding, captioning, and reasoning.
arXiv Detail & Related papers (2024-05-30T17:59:50Z) - MotionChain: Conversational Motion Controllers via Multimodal Prompts [25.181069337771127]
We present MotionChain, a conversational human motion controller to generate continuous and long-term human motion through multimodal prompts.
By leveraging large-scale language, vision-language, and vision-motion data, MotionChain comprehends each instruction in multi-turn conversation and generates human motions followed by these prompts.
arXiv Detail & Related papers (2024-04-02T07:09:29Z) - MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor.
Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.