PMP: Learning to Physically Interact with Environments using Part-wise
Motion Priors
- URL: http://arxiv.org/abs/2305.03249v1
- Date: Fri, 5 May 2023 02:27:27 GMT
- Title: PMP: Learning to Physically Interact with Environments using Part-wise
Motion Priors
- Authors: Jinseok Bae, Jungdam Won, Donggeun Lim, Cheol-Hui Min, Young Min Kim
- Abstract summary: We present a method to animate a character incorporating multiple part-wise motion priors (PMP)
The proposed PMP allows us to assemble multiple part skills to animate a character, creating a diverse set of motions with different combinations of existing data.
- Score: 10.370115975772402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a method to animate a character incorporating multiple part-wise
motion priors (PMP). While previous works allow creating realistic articulated
motions from reference data, the range of motion is largely limited by the
available samples. Especially for the interaction-rich scenarios, it is
impractical to attempt acquiring every possible interacting motion, as the
combination of physical parameters increases exponentially. The proposed PMP
allows us to assemble multiple part skills to animate a character, creating a
diverse set of motions with different combinations of existing data. In our
pipeline, we can train an agent with a wide range of part-wise priors.
Therefore, each body part can obtain a kinematic insight of the style from the
motion captures, or at the same time extract dynamics-related information from
the additional part-specific simulation. For example, we can first train a
general interaction skill, e.g. grasping, only for the dexterous part, and then
combine the expert trajectories from the pre-trained agent with the kinematic
priors of other limbs. Eventually, our whole-body agent learns a novel physical
interaction skill even with the absence of the object trajectories in the
reference motion sequence.
Related papers
- SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control [20.779031780374115]
Motion priors that guide agents toward producing naturalistic behaviors play a pivotal role in creating life-like virtual characters.<n>We present Score-Matching Motion Priors (SMP), which leverages pre-trained motion diffusion models and score distillation sampling (SDS) to create reusable task-agnostic motion priors.<n>Our method produces high-quality motion comparable to state-of-the-art adversarial imitation learning methods through reusable and modular motion priors.
arXiv Detail & Related papers (2025-12-02T18:54:12Z) - What If : Understanding Motion Through Sparse Interactions [23.795217304737548]
Flow Poke Transformer (FPT) is a framework for directly predicting the distribution of local motion.<n>FPT is conditioned on sparse interactions termed "pokes"
arXiv Detail & Related papers (2025-10-14T17:52:17Z) - Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control [72.00655365269]
We present RoboMaster, a novel framework that models inter-object dynamics through a collaborative trajectory formulation.<n>Unlike prior methods that decompose objects, our core is to decompose the interaction process into three sub-stages: pre-interaction, interaction, and post-interaction.<n>Our method outperforms existing approaches, establishing new state-of-the-art performance in trajectory-controlled video generation for robotic manipulation.
arXiv Detail & Related papers (2025-06-02T17:57:06Z) - Segment Any Motion in Videos [80.72424676419755]
We propose a novel approach for moving object segmentation that combines long-range trajectory motion cues with DINO-based semantic features.
Our model employs Spatio-Temporal Trajectory Attention and Motion-Semantic Decoupled Embedding to prioritize motion while integrating semantic support.
arXiv Detail & Related papers (2025-03-28T09:34:11Z) - Instance-Level Moving Object Segmentation from a Single Image with Events [84.12761042512452]
Moving object segmentation plays a crucial role in understanding dynamic scenes involving multiple moving objects.
Previous methods encounter difficulties in distinguishing whether pixel displacements of an object are caused by camera motion or object motion.
Recent advances exploit the motion sensitivity of novel event cameras to counter conventional images' inadequate motion modeling capabilities.
We propose the first instance-level moving object segmentation framework that integrates complementary texture and motion cues.
arXiv Detail & Related papers (2025-02-18T15:56:46Z) - MoChat: Joints-Grouped Spatio-Temporal Grounding LLM for Multi-Turn Motion Comprehension and Description [13.12764192547871]
MoChat is a model capable of fine-grained-temporal grounding of human motion.
We group spatial information of each skeleton frame based on human anatomical structure.
Various annotations are generated for jointly training.
arXiv Detail & Related papers (2024-10-15T08:49:59Z) - Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models.
Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model.
To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object
Video Generation [26.292052071093945]
We propose an unsupervised method to generate videos from a single frame and a sparse motion input.
Our trained model can generate unseen realistic object-to-object interactions.
We show that YODA is on par with or better than state of the art video generation prior work in terms of both controllability and video quality.
arXiv Detail & Related papers (2023-06-06T19:50:02Z) - Task-Oriented Human-Object Interactions Generation with Implicit Neural
Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations.
Our method generates continuous motions that are parameterized only by the temporal coordinate.
This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z) - Human Motion Diffusion as a Generative Prior [20.004837564647367]
We introduce three forms of composition based on diffusion priors.
We tackle the challenge of long sequence generation.
Using parallel composition, we show promising steps toward two-person generation.
arXiv Detail & Related papers (2023-03-02T17:09:27Z) - IMoS: Intent-Driven Full-Body Motion Synthesis for Human-Object
Interactions [69.95820880360345]
We present the first framework to synthesize the full-body motion of virtual human characters with 3D objects placed within their reach.
Our system takes as input textual instructions specifying the objects and the associated intentions of the virtual characters.
We show that our synthesized full-body motions appear more realistic to the participants in more than 80% of scenarios.
arXiv Detail & Related papers (2022-12-14T23:59:24Z) - SoMoFormer: Multi-Person Pose Forecasting with Transformers [15.617263162155062]
We present a new method, called Social Motion Transformer (SoMoFormer), for multi-person 3D pose forecasting.
Our transformer architecture uniquely models human motion input as a joint sequence rather than a time sequence.
We show that with this problem reformulation, SoMoFormer naturally extends to multi-person scenes by using the joints of all people in a scene as input queries.
arXiv Detail & Related papers (2022-08-30T06:59:28Z) - Interaction Transformer for Human Reaction Generation [61.22481606720487]
We propose a novel interaction Transformer (InterFormer) consisting of a Transformer network with both temporal and spatial attentions.
Our method is general and can be used to generate more complex and long-term interactions.
arXiv Detail & Related papers (2022-07-04T19:30:41Z) - Hierarchical Style-based Networks for Motion Synthesis [150.226137503563]
We propose a self-supervised method for generating long-range, diverse and plausible behaviors to achieve a specific goal location.
Our proposed method learns to model the motion of human by decomposing a long-range generation task in a hierarchical manner.
On large-scale skeleton dataset, we show that the proposed method is able to synthesise long-range, diverse and plausible motion.
arXiv Detail & Related papers (2020-08-24T02:11:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.