Best Practices for 2-Body Pose Forecasting
- URL: http://arxiv.org/abs/2304.05758v1
- Date: Wed, 12 Apr 2023 10:46:23 GMT
- Title: Best Practices for 2-Body Pose Forecasting
- Authors: Muhammad Rameez Ur Rahman, Luca Scofano, Edoardo De Matteis,
Alessandro Flaborea, Alessio Sampieri, Fabio Galasso
- Abstract summary: We review the progress in human pose forecasting and provide an in-depth assessment of the single-person practices that perform best.
Other single-person practices do not transfer to 2-body, so the proposed best ones do not include hierarchical body modeling or attention-based interaction encoding.
Our proposed 2-body pose forecasting best practices yield a performance improvement of 21.9% over the state-of-the-art on the most recent ExPI dataset.
- Score: 58.661899246497896
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of collaborative human pose forecasting stands for predicting the
future poses of multiple interacting people, given those in previous frames.
Predicting two people in interaction, instead of each separately, promises
better performance, due to their body-body motion correlations. But the task
has remained so far primarily unexplored.
In this paper, we review the progress in human pose forecasting and provide
an in-depth assessment of the single-person practices that perform best for
2-body collaborative motion forecasting. Our study confirms the positive impact
of frequency input representations, space-time separable and fully-learnable
interaction adjacencies for the encoding GCN and FC decoding. Other
single-person practices do not transfer to 2-body, so the proposed best ones do
not include hierarchical body modeling or attention-based interaction encoding.
We further contribute a novel initialization procedure for the 2-body spatial
interaction parameters of the encoder, which benefits performance and
stability. Altogether, our proposed 2-body pose forecasting best practices
yield a performance improvement of 21.9% over the state-of-the-art on the most
recent ExPI dataset, whereby the novel initialization accounts for 3.5%. See
our project page at https://www.pinlab.org/bestpractices2body
Related papers
- Expressive Forecasting of 3D Whole-body Human Motions [38.93700642077312]
We are the first to formulate a whole-body human pose forecasting framework.
Our model involves two key constituents: cross-context alignment (XCA) and cross-context interaction (XCI)
We conduct extensive experiments on a newly-introduced large-scale benchmark and achieve state-of-theart performance.
arXiv Detail & Related papers (2023-12-19T09:09:46Z) - DisenHCN: Disentangled Hypergraph Convolutional Networks for
Spatiotemporal Activity Prediction [53.76601630407521]
We propose a hypergraph network model called DisenHCN to bridge the gaps in existing solutions.
In particular, we first unify fine-grained user similarity and the complex matching between user preferences andtemporal activity into a heterogeneous hypergraph.
We then disentangle the user representations into different aspects (location-aware, time-aware, and activity-aware) and aggregate corresponding aspect's features on the constructed hypergraph.
arXiv Detail & Related papers (2022-08-14T06:51:54Z) - Jointformer: Single-Frame Lifting Transformer with Error Prediction and
Refinement for 3D Human Pose Estimation [11.592567773739407]
3D human pose estimation technologies have the potential to greatly increase the availability of human movement data.
The best-performing models for single-image 2D-3D lifting use graph convolutional networks (GCNs) that typically require some manual input to define the relationships between different body joints.
We propose a novel transformer-based approach that uses the more generalised self-attention mechanism to learn these relationships.
arXiv Detail & Related papers (2022-08-07T12:07:19Z) - Comparison of Spatio-Temporal Models for Human Motion and Pose
Forecasting in Face-to-Face Interaction Scenarios [47.99589136455976]
We present the first systematic comparison of state-of-the-art approaches for behavior forecasting.
Our best attention-based approaches achieve state-of-the-art performance in UDIVA v0.5.
We show that by autoregressively predicting the future with methods trained for the short-term future, we outperform the baselines even for a considerably longer-term future.
arXiv Detail & Related papers (2022-03-07T09:59:30Z) - Development of Human Motion Prediction Strategy using Inception Residual
Block [1.0705399532413613]
We propose an Inception Residual Block (IRB) to detect temporal features in human poses.
Our main contribution is to propose a residual connection between input and the output of the inception block to have a continuity between the previously observed pose and the next predicted pose.
With this proposed architecture, it learns prior knowledge much better about human poses and we achieve much higher prediction accuracy as detailed in the paper.
arXiv Detail & Related papers (2021-08-09T12:49:48Z) - Improving Robustness and Accuracy via Relative Information Encoding in
3D Human Pose Estimation [59.94032196768748]
We propose a relative information encoding method that yields positional and temporal enhanced representations.
Our method outperforms state-of-the-art methods on two public datasets.
arXiv Detail & Related papers (2021-07-29T14:12:19Z) - Online Multi-Agent Forecasting with Interpretable Collaborative Graph
Neural Network [65.11999700562869]
We propose a novel collaborative prediction unit (CoPU), which aggregates predictions from multiple collaborative predictors according to a collaborative graph.
Our methods outperform state-of-the-art works on the three tasks by 28.6%, 17.4% and 21.0% on average.
arXiv Detail & Related papers (2021-07-02T08:20:06Z) - SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up
Human Pose Estimation [81.03485688525133]
We propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE)
Specifically, in the training process, we enable SIMPLE to mimic the pose knowledge from the high-performance top-down pipeline.
Besides, SIMPLE formulates human detection and pose estimation as a unified point learning framework to complement each other in single-network.
arXiv Detail & Related papers (2021-04-06T13:12:51Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.