Words in Motion: Representation Engineering for Motion Forecasting
- URL: http://arxiv.org/abs/2406.11624v1
- Date: Mon, 17 Jun 2024 15:07:55 GMT
- Title: Words in Motion: Representation Engineering for Motion Forecasting
- Authors: Omer Sahin Tas, Royden Wagner,
- Abstract summary: Motion forecasting transforms sequences of past movements and environment context into future motion.
Recent methods rely on learned representations, resulting in hidden states that are difficult to interpret.
We use natural language to quantize motion features in a human-interpretable way, and measure the degree to which they are embedded in hidden states.
- Score: 1.1510009152620668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motion forecasting transforms sequences of past movements and environment context into future motion. Recent methods rely on learned representations, resulting in hidden states that are difficult to interpret. In this work, we use natural language to quantize motion features in a human-interpretable way, and measure the degree to which they are embedded in hidden states. Our experiments reveal that hidden states of motion sequences are arranged with respect to our discrete sets of motion features. Following these insights, we fit control vectors to motion features, which allow for controlling motion forecasts at inference. Consequently, our method enables controlling transformer-based motion forecasting models with textual inputs, providing a unique interface to interact with and understand these models. Our implementation is available at https://github.com/kit-mrt/future-motion
Related papers
- Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer [55.109778609058154]
Existing diffusion-based motion editing methods overlook the profound potential of the prior embedded within the weights of pre-trained models.
We uncover the roles and interactions of attention elements in capturing and representing motion patterns.
We integrate these elements to transfer a leader motion to a follower one while maintaining the nuanced characteristics of the follower, resulting in zero-shot motion transfer.
arXiv Detail & Related papers (2024-06-10T17:47:14Z) - Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models.
Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model.
To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - Plan, Posture and Go: Towards Open-World Text-to-Motion Generation [43.392549755386135]
We present a divide-and-conquer framework named PRO-Motion.
It consists of three modules as motion planner, posture-diffuser and go-diffuser.
Pro-Motion can generate diverse and realistic motions from complex open-world prompts.
arXiv Detail & Related papers (2023-12-22T17:02:45Z) - Universal Humanoid Motion Representations for Physics-Based Control [71.46142106079292]
We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control.
We first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset.
We then create our motion representation by distilling skills directly from the imitator.
arXiv Detail & Related papers (2023-10-06T20:48:43Z) - MotionGPT: Human Motion as a Foreign Language [47.21648303282788]
Human motion displays a semantic coupling akin to human language, often perceived as a form of body language.
By fusing language data with large-scale motion models, motion-language pre-training can enhance the performance of motion-related tasks.
We propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks.
arXiv Detail & Related papers (2023-06-26T15:53:02Z) - MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework.
It excels at modeling complicated data distribution and generating vivid motion sequences.
It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z) - Generating Smooth Pose Sequences for Diverse Human Motion Prediction [90.45823619796674]
We introduce a unified deep generative network for both diverse and controllable motion prediction.
Our experiments on two standard benchmark datasets, Human3.6M and HumanEva-I, demonstrate that our approach outperforms the state-of-the-art baselines in terms of both sample diversity and accuracy.
arXiv Detail & Related papers (2021-08-19T00:58:00Z) - Self-supervised Motion Learning from Static Images [36.85209332144106]
Motion from Static Images (MoSI) learns to encode motion information.
MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
We demonstrate that MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
arXiv Detail & Related papers (2021-04-01T03:55:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.