Stochastic Multi-Person 3D Motion Forecasting
- URL: http://arxiv.org/abs/2306.05421v1
- Date: Thu, 8 Jun 2023 17:59:09 GMT
- Title: Stochastic Multi-Person 3D Motion Forecasting
- Authors: Sirui Xu, Yu-Xiong Wang, Liang-Yan Gui
- Abstract summary: We deal with the ignored real-world complexities in prior work on human motion forecasting.
Our framework is general; we instantiate it with different generative models.
Our approach produces diverse and accurate multi-person predictions, significantly outperforming the state of the art.
- Score: 21.915057426589744
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper aims to deal with the ignored real-world complexities in prior
work on human motion forecasting, emphasizing the social properties of
multi-person motion, the diversity of motion and social interactions, and the
complexity of articulated motion. To this end, we introduce a novel task of
stochastic multi-person 3D motion forecasting. We propose a dual-level
generative modeling framework that separately models independent individual
motion at the local level and social interactions at the global level. Notably,
this dual-level modeling mechanism can be achieved within a shared generative
model, through introducing learnable latent codes that represent intents of
future motion and switching the codes' modes of operation at different levels.
Our framework is general; we instantiate it with different generative models,
including generative adversarial networks and diffusion models, and various
multi-person forecasting models. Extensive experiments on CMU-Mocap, MuPoTS-3D,
and SoMoF benchmarks show that our approach produces diverse and accurate
multi-person predictions, significantly outperforming the state of the art.
Related papers
- Multi-Resolution Generative Modeling of Human Motion from Limited Data [3.5229503563299915]
We present a generative model that learns to synthesize human motion from limited training sequences.
The model adeptly captures human motion patterns by integrating skeletal convolution layers and a multi-scale architecture.
arXiv Detail & Related papers (2024-11-25T15:36:29Z) - COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models [14.130327598928778]
Large language models (LLMs) and hierarchical motion-specific vector-quantized variational autoencoders (VQ-VAEs) are proposed.
Our framework generates realistic and diverse collaborative human-object-human interactions, outperforming state-of-the-art methods.
Our work opens up new possibilities for modeling complex interactions in various domains, such as robotics, graphics and computer vision.
arXiv Detail & Related papers (2024-09-30T17:02:13Z) - Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning [41.09061877498741]
We propose an interaction-aware trajectory-conditioned long-term multi-agent human pose forecasting model.
Our model effectively handles the multi-modality of human motion and the complexity of long-term multi-agent interactions.
arXiv Detail & Related papers (2024-04-08T06:15:13Z) - Large Motion Model for Unified Multi-Modal Motion Generation [50.56268006354396]
Large Motion Model (LMM) is a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model.
LMM tackles these challenges from three principled aspects.
arXiv Detail & Related papers (2024-04-01T17:55:11Z) - DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions.
We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z) - Persistent-Transient Duality: A Multi-mechanism Approach for Modeling
Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts.
In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline.
This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z) - The MI-Motion Dataset and Benchmark for 3D Multi-Person Motion
Prediction [13.177817435234449]
3D multi-person motion prediction is a challenging task that involves modeling individual behaviors and interactions between people.
We introduce the Multi-Person Interaction Motion (MI-Motion) dataset, which includes skeleton sequences of multiple individuals collected by motion capture systems.
The dataset contains 167k frames of interacting people's skeleton poses and is categorized into 5 different activity scenes.
arXiv Detail & Related papers (2023-06-23T15:38:22Z) - MultiViz: An Analysis Benchmark for Visualizing and Understanding
Multimodal Models [103.9987158554515]
MultiViz is a method for analyzing the behavior of multimodal models by scaffolding the problem of interpretability into 4 stages.
We show that the complementary stages in MultiViz together enable users to simulate model predictions, assign interpretable concepts to features, perform error analysis on model misclassifications, and use insights from error analysis to debug models.
arXiv Detail & Related papers (2022-06-30T18:42:06Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z) - HuMoR: 3D Human Motion Model for Robust Pose Estimation [100.55369985297797]
HuMoR is a 3D Human Motion Model for Robust Estimation of temporal pose and shape.
We introduce a conditional variational autoencoder, which learns a distribution of the change in pose at each step of a motion sequence.
We demonstrate that our model generalizes to diverse motions and body shapes after training on a large motion capture dataset.
arXiv Detail & Related papers (2021-05-10T21:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.