PGformer: Proxy-Bridged Game Transformer for Multi-Person Highly
Interactive Extreme Motion Prediction
- URL: http://arxiv.org/abs/2306.03374v3
- Date: Sun, 7 Jan 2024 14:05:41 GMT
- Title: PGformer: Proxy-Bridged Game Transformer for Multi-Person Highly
Interactive Extreme Motion Prediction
- Authors: Yanwen Fang, Jintai Chen, Peng-Tao Jiang, Chao Li, Yifeng Geng, Eddy
K. F. Lam, Guodong Li
- Abstract summary: This paper focuses on collaborative motion prediction for multiple persons with extreme motions.
A proxy unit is introduced to bridge the involved persons, which cooperates with our proposed XQA module.
Our approach can also be compatible with the weakly interacted CMU-Mocap and MuPoTS-3D datasets.
- Score: 22.209454616479505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-person motion prediction is a challenging task, especially for
real-world scenarios of highly interacted persons. Most previous works have
been devoted to studying the case of weak interactions (e.g., walking
together), in which typically forecasting each human pose in isolation can
still achieve good performances. This paper focuses on collaborative motion
prediction for multiple persons with extreme motions and attempts to explore
the relationships between the highly interactive persons' pose trajectories.
Specifically, a novel cross-query attention (XQA) module is proposed to
bilaterally learn the cross-dependencies between the two pose sequences
tailored for this situation. A proxy unit is additionally introduced to bridge
the involved persons, which cooperates with our proposed XQA module and subtly
controls the bidirectional spatial information flows. These designs are then
integrated into a Transformer-based architecture and the resulting model is
called Proxy-bridged Game Transformer (PGformer) for multi-person interactive
motion prediction. Its effectiveness has been evaluated on the challenging ExPI
dataset, which involves highly interactive actions. Our PGformer consistently
outperforms the state-of-the-art methods in both short- and long-term
predictions by a large margin. Besides, our approach can also be compatible
with the weakly interacted CMU-Mocap and MuPoTS-3D datasets and extended to the
case of more than 2 individuals with encouraging results.
Related papers
- Relation Learning and Aggregate-attention for Multi-person Motion Prediction [13.052342503276936]
Multi-person motion prediction considers not just the skeleton structures or human trajectories but also the interactions between others.
Previous methods often overlook that the joints relations within an individual (intra-relation) and interactions among groups (inter-relation) are distinct types of representations.
We introduce a new collaborative framework for multi-person motion prediction that explicitly modeling these relations.
arXiv Detail & Related papers (2024-11-06T07:48:30Z) - Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs.
Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction.
We benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z) - Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning [41.09061877498741]
We propose an interaction-aware trajectory-conditioned long-term multi-agent human pose forecasting model.
Our model effectively handles the multi-modality of human motion and the complexity of long-term multi-agent interactions.
arXiv Detail & Related papers (2024-04-08T06:15:13Z) - Joint-Relation Transformer for Multi-Person Motion Prediction [79.08243886832601]
We propose the Joint-Relation Transformer to enhance interaction modeling.
Our method achieves a 13.4% improvement of 900ms VIM on 3DPW-SoMoF/RC and 17.8%/12.0% improvement of 3s MPJPE.
arXiv Detail & Related papers (2023-08-09T09:02:47Z) - The MI-Motion Dataset and Benchmark for 3D Multi-Person Motion
Prediction [13.177817435234449]
3D multi-person motion prediction is a challenging task that involves modeling individual behaviors and interactions between people.
We introduce the Multi-Person Interaction Motion (MI-Motion) dataset, which includes skeleton sequences of multiple individuals collected by motion capture systems.
The dataset contains 167k frames of interacting people's skeleton poses and is categorized into 5 different activity scenes.
arXiv Detail & Related papers (2023-06-23T15:38:22Z) - A Hierarchical Hybrid Learning Framework for Multi-agent Trajectory
Prediction [4.181632607997678]
We propose a hierarchical hybrid framework of deep learning (DL) and reinforcement learning (RL) for multi-agent trajectory prediction.
In the DL stage, the traffic scene is divided into multiple intermediate-scale heterogenous graphs based on which Transformer-style GNNs are adopted to encode heterogenous interactions.
In the RL stage, we divide the traffic scene into local sub-scenes utilizing the key future points predicted in the DL stage.
arXiv Detail & Related papers (2023-03-22T02:47:42Z) - Rethinking Trajectory Prediction via "Team Game" [118.59480535826094]
We present a novel formulation for multi-agent trajectory prediction, which explicitly introduces the concept of interactive group consensus.
On two multi-agent settings, i.e. team sports and pedestrians, the proposed framework consistently achieves superior performance compared to existing methods.
arXiv Detail & Related papers (2022-10-17T07:16:44Z) - Interaction Transformer for Human Reaction Generation [61.22481606720487]
We propose a novel interaction Transformer (InterFormer) consisting of a Transformer network with both temporal and spatial attentions.
Our method is general and can be used to generate more complex and long-term interactions.
arXiv Detail & Related papers (2022-07-04T19:30:41Z) - Learning Multiscale Correlations for Human Motion Prediction [10.335804615372629]
We propose a novel multiscale graph convolution network (MGCN) to capture the correlations among human body components.
We evaluate our approach on two standard benchmark datasets for human motion prediction.
arXiv Detail & Related papers (2021-03-19T07:58:16Z) - End-to-end Contextual Perception and Prediction with Interaction
Transformer [79.14001602890417]
We tackle the problem of detecting objects in 3D and forecasting their future motion in the context of self-driving.
To capture their spatial-temporal dependencies, we propose a recurrent neural network with a novel Transformer architecture.
Our model can be trained end-to-end, and runs in real-time.
arXiv Detail & Related papers (2020-08-13T14:30:12Z) - A Spatial-Temporal Attentive Network with Spatial Continuity for
Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC)
First, spatial-temporal attention mechanism is presented to explore the most useful and important information.
Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.