Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning
- URL: http://arxiv.org/abs/2404.05218v1
- Date: Mon, 8 Apr 2024 06:15:13 GMT
- Title: Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning
- Authors: Jaewoo Jeong, Daehee Park, Kuk-Jin Yoon,
- Abstract summary: We propose an interaction-aware trajectory-conditioned long-term multi-agent human pose forecasting model.
Our model effectively handles the multi-modality of human motion and the complexity of long-term multi-agent interactions.
- Score: 41.09061877498741
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human pose forecasting garners attention for its diverse applications. However, challenges in modeling the multi-modal nature of human motion and intricate interactions among agents persist, particularly with longer timescales and more agents. In this paper, we propose an interaction-aware trajectory-conditioned long-term multi-agent human pose forecasting model, utilizing a coarse-to-fine prediction approach: multi-modal global trajectories are initially forecasted, followed by respective local pose forecasts conditioned on each mode. In doing so, our Trajectory2Pose model introduces a graph-based agent-wise interaction module for a reciprocal forecast of local motion-conditioned global trajectory and trajectory-conditioned local pose. Our model effectively handles the multi-modality of human motion and the complexity of long-term multi-agent interactions, improving performance in complex environments. Furthermore, we address the lack of long-term (6s+) multi-agent (5+) datasets by constructing a new dataset from real-world images and 2D annotations, enabling a comprehensive evaluation of our proposed model. State-of-the-art prediction performance on both complex and simpler datasets confirms the generalized effectiveness of our method. The code is available at https://github.com/Jaewoo97/T2P.
Related papers
- Generative Hierarchical Temporal Transformer for Hand Pose and Action Modeling [67.94143911629143]
We propose a generative Transformer VAE architecture to model hand pose and action.
To faithfully model the semantic dependency and different temporal granularity of hand pose and action, we decompose the framework into two cascaded VAE blocks.
Results show that our joint modeling of recognition and prediction improves over isolated solutions.
arXiv Detail & Related papers (2023-11-29T05:28:39Z) - MotionLM: Multi-Agent Motion Forecasting as Language Modeling [15.317827804763699]
We present MotionLM, a language model for multi-agent motion prediction.
Our approach bypasses post-hoc interactions where individual agent trajectory generation is conducted prior to interactive scoring.
The model's sequential factorization enables temporally causal conditional rollouts.
arXiv Detail & Related papers (2023-09-28T15:46:25Z) - Stochastic Multi-Person 3D Motion Forecasting [21.915057426589744]
We deal with the ignored real-world complexities in prior work on human motion forecasting.
Our framework is general; we instantiate it with different generative models.
Our approach produces diverse and accurate multi-person predictions, significantly outperforming the state of the art.
arXiv Detail & Related papers (2023-06-08T17:59:09Z) - PGformer: Proxy-Bridged Game Transformer for Multi-Person Highly
Interactive Extreme Motion Prediction [22.209454616479505]
This paper focuses on collaborative motion prediction for multiple persons with extreme motions.
A proxy unit is introduced to bridge the involved persons, which cooperates with our proposed XQA module.
Our approach can also be compatible with the weakly interacted CMU-Mocap and MuPoTS-3D datasets.
arXiv Detail & Related papers (2023-06-06T03:25:09Z) - Global-to-Local Modeling for Video-based 3D Human Pose and Shape
Estimation [53.04781510348416]
Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness.
We propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT)
Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M.
arXiv Detail & Related papers (2023-03-26T14:57:49Z) - JFP: Joint Future Prediction with Interactive Multi-Agent Modeling for
Autonomous Driving [12.460224193998362]
We propose an end-to-end trainable model that learns directly the interaction between pairs of agents in a structured, graphical model formulation.
Our approach improves significantly on the trajectory overlap metrics while obtaining on-par or better performance on single-agent trajectory metrics.
arXiv Detail & Related papers (2022-12-16T20:59:21Z) - Continuous-Time and Multi-Level Graph Representation Learning for
Origin-Destination Demand Prediction [52.0977259978343]
This paper proposes a Continuous-time and Multi-level dynamic graph representation learning method for Origin-Destination demand prediction (CMOD)
The state vectors keep historical transaction information and are continuously updated according to the most recently happened transactions.
Experiments are conducted on two real-world datasets from Beijing Subway and New York Taxi, and the results demonstrate the superiority of our model against the state-of-the-art approaches.
arXiv Detail & Related papers (2022-06-30T03:37:50Z) - MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory
Prediction [28.438787700968703]
Conditional MUSE offers diverse and simultaneously more accurate predictions compared to the current state-of-the-art.
We demonstrate these assertions through a comprehensive set of experiments on nuScenes and SDD benchmarks as well as PFSD, a new synthetic dataset.
arXiv Detail & Related papers (2022-01-18T18:40:03Z) - Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions.
In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems.
Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z) - A Spatial-Temporal Attentive Network with Spatial Continuity for
Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC)
First, spatial-temporal attention mechanism is presented to explore the most useful and important information.
Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.