Related papers: Towards more realistic human motion prediction with attention to motion coordination

Towards more realistic human motion prediction with attention to motion coordination

URL: http://arxiv.org/abs/2404.03584v1
Date: Thu, 4 Apr 2024 16:48:40 GMT
Title: Towards more realistic human motion prediction with attention to motion coordination
Authors: Pengxiang Ding, Jianqin Yin,
Abstract summary: We propose a novel joint relation modeling module, Comprehensive Joint Relation Extractor (CJRE), to combine this motion coordination with the local interactions between joint pairs in a unified manner. The proposed framework outperforms state-of-the-art methods in both short- and long-term predictions on H3.6M, CMU-Mocap, and 3DPW.
Score: 7.243632426715939
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Joint relation modeling is a curial component in human motion prediction. Most existing methods rely on skeletal-based graphs to build the joint relations, where local interactive relations between joint pairs are well learned. However, the motion coordination, a global joint relation reflecting the simultaneous cooperation of all joints, is usually weakened because it is learned from part to whole progressively and asynchronously. Thus, the final predicted motions usually appear unrealistic. To tackle this issue, we learn a medium, called coordination attractor (CA), from the spatiotemporal features of motion to characterize the global motion features, which is subsequently used to build new relative joint relations. Through the CA, all joints are related simultaneously, and thus the motion coordination of all joints can be better learned. Based on this, we further propose a novel joint relation modeling module, Comprehensive Joint Relation Extractor (CJRE), to combine this motion coordination with the local interactions between joint pairs in a unified manner. Additionally, we also present a Multi-timescale Dynamics Extractor (MTDE) to extract enriched dynamics from the raw position information for effective prediction. Extensive experiments show that the proposed framework outperforms state-of-the-art methods in both short- and long-term predictions on H3.6M, CMU-Mocap, and 3DPW.

Related papers

Text-Derived Relational Graph-Enhanced Network for Skeleton-Based Action Segmentation [14.707224594220264]
We propose a Text-Derived Graph Network (TRG-Net) to enhance both modeling and supervision. For modeling, the Dynamic Spatio-Temporal Fusion Modeling (D) method incorporates Text-Derived Joint Graphs (JGT) with channel adaptation. For supervision, the Absolute-Relative Inter-Class Supervision (ARIS) method employs contrastive learning between action features and text embeddings to regularize the absolute class.
arXiv Detail & Related papers (2025-03-19T11:38:14Z)
ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation [25.777159581915658]
ChainHOI is a novel approach for text-driven human-object interaction generation. It explicitly models interactions at both the joint and kinetic chain levels.
arXiv Detail & Related papers (2025-03-17T12:55:34Z)
Relation Learning and Aggregate-attention for Multi-person Motion Prediction [13.052342503276936]
Multi-person motion prediction considers not just the skeleton structures or human trajectories but also the interactions between others. Previous methods often overlook that the joints relations within an individual (intra-relation) and interactions among groups (inter-relation) are distinct types of representations. We introduce a new collaborative framework for multi-person motion prediction that explicitly modeling these relations.
arXiv Detail & Related papers (2024-11-06T07:48:30Z)
Joint-Motion Mutual Learning for Pose Estimation in Videos [21.77871402339573]
Human pose estimation in videos has long been a compelling yet challenging task within the realm of computer vision. Recent methods strive to integrate multi-frame visual features generated by a backbone network for pose estimation. We propose a novel joint-motion mutual learning framework for pose estimation.
arXiv Detail & Related papers (2024-08-05T07:37:55Z)
A Decoupled Spatio-Temporal Framework for Skeleton-based Action Segmentation [89.86345494602642]
Existing methods are limited in weak-temporal modeling capability. We propose a Decoupled Scoupled Framework (DeST) to address the issues. DeST significantly outperforms current state-of-the-art methods with less computational complexity.
arXiv Detail & Related papers (2023-12-10T09:11:39Z)
Learning Complete Topology-Aware Correlations Between Relations for Inductive Link Prediction [121.65152276851619]
We show that semantic correlations between relations are inherently edge-level and entity-independent. We propose a novel subgraph-based method, namely TACO, to model Topology-Aware COrrelations between relations. To further exploit the potential of RCN, we propose Complete Common Neighbor induced subgraph.
arXiv Detail & Related papers (2023-09-20T08:11:58Z)
Joint-Relation Transformer for Multi-Person Motion Prediction [79.08243886832601]
We propose the Joint-Relation Transformer to enhance interaction modeling. Our method achieves a 13.4% improvement of 900ms VIM on 3DPW-SoMoF/RC and 17.8%/12.0% improvement of 3s MPJPE.
arXiv Detail & Related papers (2023-08-09T09:02:47Z)
GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction [61.833152949826946]
We propose a novel Ground-aware Motion Model for 3D Human Motion Reconstruction, named GraMMaR. GraMMaR learns the distribution of transitions in both pose and interaction between every joint and ground plane at each time step of a motion sequence. It is trained to explicitly promote consistency between the motion and distance change towards the ground.
arXiv Detail & Related papers (2023-06-29T07:22:20Z)
Kinematics Modeling Network for Video-based Human Pose Estimation [9.506011491028891]
Estimating human poses from videos is critical in human-computer interaction. Joints cooperate rather than move independently during human movement. We propose a plug-and-play kinematics modeling module (KMM) to explicitly model temporal correlations between joints.
arXiv Detail & Related papers (2022-07-22T09:37:48Z)
Motion Prediction via Joint Dependency Modeling in Phase Space [40.54430409142653]
We introduce a novel convolutional neural model to leverage explicit prior knowledge of motion anatomy. We then propose a global optimization module that learns the implicit relationships between individual joint features. Our method is evaluated on large-scale 3D human motion benchmark datasets.
arXiv Detail & Related papers (2022-01-07T08:30:01Z)
An Attractor-Guided Neural Networks for Skeleton-Based Human Motion Prediction [0.4568777157687961]
Joint modeling is a curial component in human motion prediction. We learn a medium, called balance attractor (BA), fromtemporal features to characterize the global motion features. Through the BA, all joints are related synchronously, and thus the global coordination of all joints can be better learned.
arXiv Detail & Related papers (2021-05-20T12:51:39Z)
Pose And Joint-Aware Action Recognition [87.4780883700755]
We present a new model for joint-based action recognition, which first extracts motion features from each joint separately through a shared motion encoder. Our joint selector module re-weights the joint information to select the most discriminative joints for the task. We show large improvements over the current state-of-the-art joint-based approaches on JHMDB, HMDB, Charades, AVA action recognition datasets.
arXiv Detail & Related papers (2020-10-16T04:43:34Z)
Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding. At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network. With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.