Relation Learning and Aggregate-attention for Multi-person Motion Prediction
- URL: http://arxiv.org/abs/2411.03729v1
- Date: Wed, 06 Nov 2024 07:48:30 GMT
- Title: Relation Learning and Aggregate-attention for Multi-person Motion Prediction
- Authors: Kehua Qu, Rui Ding, Jin Tang,
- Abstract summary: Multi-person motion prediction considers not just the skeleton structures or human trajectories but also the interactions between others.
Previous methods often overlook that the joints relations within an individual (intra-relation) and interactions among groups (inter-relation) are distinct types of representations.
We introduce a new collaborative framework for multi-person motion prediction that explicitly modeling these relations.
- Score: 13.052342503276936
- License:
- Abstract: Multi-person motion prediction is an emerging and intricate task with broad real-world applications. Unlike single person motion prediction, it considers not just the skeleton structures or human trajectories but also the interactions between others. Previous methods use various networks to achieve impressive predictions but often overlook that the joints relations within an individual (intra-relation) and interactions among groups (inter-relation) are distinct types of representations. These methods often lack explicit representation of inter&intra-relations, and inevitably introduce undesired dependencies. To address this issue, we introduce a new collaborative framework for multi-person motion prediction that explicitly modeling these relations:a GCN-based network for intra-relations and a novel reasoning network for inter-relations.Moreover, we propose a novel plug-and-play aggregation module called the Interaction Aggregation Module (IAM), which employs an aggregate-attention mechanism to seamlessly integrate these relations. Experiments indicate that the module can also be applied to other dual-path models. Extensive experiments on the 3DPW, 3DPW-RC, CMU-Mocap, MuPoTS-3D, as well as synthesized datasets Mix1 & Mix2 (9 to 15 persons), demonstrate that our method achieves state-of-the-art performance.
Related papers
- Learning Mutual Excitation for Hand-to-Hand and Human-to-Human
Interaction Recognition [22.538114033191313]
We propose a mutual excitation graph convolutional network (me-GCN) by stacking mutual excitation graph convolution layers.
Me-GC learns mutual information in each layer and each stage of graph convolution operations.
Our proposed me-GC outperforms state-of-the-art GCN-based and Transformer-based methods.
arXiv Detail & Related papers (2024-02-04T10:00:00Z) - I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal
Information Extraction [10.684005956288347]
We present the Intra- and Inter-Sample Relationship Modeling (I2SRM) method for this task.
Our proposed method achieves competitive results, 77.12% F1-score on Twitter-2015, 88.40% F1-score on Twitter-2017, and 84.12% F1-score on MNRE.
arXiv Detail & Related papers (2023-10-10T05:50:25Z) - Joint-Relation Transformer for Multi-Person Motion Prediction [79.08243886832601]
We propose the Joint-Relation Transformer to enhance interaction modeling.
Our method achieves a 13.4% improvement of 900ms VIM on 3DPW-SoMoF/RC and 17.8%/12.0% improvement of 3s MPJPE.
arXiv Detail & Related papers (2023-08-09T09:02:47Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions [49.097973114627344]
We present InterGen, an effective diffusion-based approach that incorporates human-to-human interactions into the motion diffusion process.
We first contribute a multimodal dataset, named InterHuman. It consists of about 107M frames for diverse two-person interactions, with accurate skeletal motions and 23,337 natural language descriptions.
We propose a novel representation for motion input in our interaction diffusion model, which explicitly formulates the global relations between the two performers in the world frame.
arXiv Detail & Related papers (2023-04-12T08:12:29Z) - Rethinking Trajectory Prediction via "Team Game" [118.59480535826094]
We present a novel formulation for multi-agent trajectory prediction, which explicitly introduces the concept of interactive group consensus.
On two multi-agent settings, i.e. team sports and pedestrians, the proposed framework consistently achieves superior performance compared to existing methods.
arXiv Detail & Related papers (2022-10-17T07:16:44Z) - Interaction Modeling with Multiplex Attention [17.04973256281265]
We introduce a method for accurately modeling multi-agent systems.
We show that our approach outperforms state-of-the-art models in trajectory forecasting and relation inference.
arXiv Detail & Related papers (2022-08-23T00:29:18Z) - Spatio-Temporal Dynamic Inference Network for Group Activity Recognition [7.007702816885332]
Group activity aims to understand the activity performed by a group of people in order to solve it.
Previous methods are limited in reasoning on a predefined graph, which ignores the person-specific context.
We propose Dynamic Inference Network (DIN), which composes of Dynamic Relation (DR) module and Dynamic Walk (DW) module.
arXiv Detail & Related papers (2021-08-26T12:40:20Z) - Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions.
In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems.
Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.