Related papers: Breaking the Passive Learning Trap: An Active Perception Strategy for Human Motion Prediction

Breaking the Passive Learning Trap: An Active Perception Strategy for Human Motion Prediction

URL: http://arxiv.org/abs/2511.14237v1
Date: Tue, 18 Nov 2025 08:15:23 GMT
Title: Breaking the Passive Learning Trap: An Active Perception Strategy for Human Motion Prediction
Authors: Juncheng Hu, Zijian Zhang, Zeyu Wang, Guoyu Wang, Yingji Li, Kedi Lyu,
Abstract summary: We propose an Active Perceptual Strategy (APS) for human motion prediction.<n>APS explicitly encodes motion properties while introducing auxiliary learning objectives.<n> experimental results demonstrate that our method achieves the new state-of-the-art, existing methods by large margins.
Score: 17.106453744027217
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Forecasting 3D human motion is an important embodiment of fine-grained understanding and cognition of human behavior by artificial agents. Current approaches excessively rely on implicit network modeling of spatiotemporal relationships and motion characteristics, falling into the passive learning trap that results in redundant and monotonous 3D coordinate information acquisition while lacking actively guided explicit learning mechanisms. To overcome these issues, we propose an Active Perceptual Strategy (APS) for human motion prediction, leveraging quotient space representations to explicitly encode motion properties while introducing auxiliary learning objectives to strengthen spatio-temporal modeling. Specifically, we first design a data perception module that projects poses into the quotient space, decoupling motion geometry from coordinate redundancy. By jointly encoding tangent vectors and Grassmann projections, this module simultaneously achieves geometric dimension reduction, semantic decoupling, and dynamic constraint enforcement for effective motion pose characterization. Furthermore, we introduce a network perception module that actively learns spatio-temporal dependencies through restorative learning. This module deliberately masks specific joints or injects noise to construct auxiliary supervision signals. A dedicated auxiliary learning network is designed to actively adapt and learn from perturbed information. Notably, APS is model agnostic and can be integrated with different prediction models to enhance active perceptual. The experimental results demonstrate that our method achieves the new state-of-the-art, outperforming existing methods by large margins: 16.3% on H3.6M, 13.9% on CMU Mocap, and 10.1% on 3DPW.

Related papers

Foundation Model for Skeleton-Based Human Action Understanding [56.89025287217221]
This paper presents a Unified Skeleton-based Dense Representation Learning framework.<n>USDRL consists of a Transformer-based Dense Spatio-Temporal (DSTE), Multi-Grained Feature Decorrelation (MG-FD), and Multi-Perspective Consistency Training (MPCT)
arXiv Detail & Related papers (2025-08-18T02:42:16Z)
GGMotion: Group Graph Dynamics-Kinematics Networks for Human Motion Prediction [0.0]
GGMotion is a group graph dynamics-kinematics network that models human topology in groups to better leverage dynamics and kinematics priors.<n>Inter-group and intra-group interaction modules are employed to capture the dependencies of joints at different scales.<n>Our approach achieves a significant performance margin in short-term motion prediction.
arXiv Detail & Related papers (2025-07-10T08:02:01Z)
Spatio-temporal MLP-graph network for 3D human pose estimation [8.267311047244881]
Graph convolutional networks and their variants have shown significant promise in 3D human pose estimation. We introduce a new weighted Jacobi feature rule obtained through graph filtering with implicit propagation fairing. We also employ adjacency modulation with the aim of learning meaningful correlations beyond defined between body joints.
arXiv Detail & Related papers (2023-08-29T14:00:55Z)
Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction [106.06256351200068]
This paper introduces a model learning framework with auxiliary tasks. In our auxiliary tasks, partial body joints' coordinates are corrupted by either masking or adding noise. We propose a novel auxiliary-adapted transformer, which can handle incomplete, corrupted motion data.
arXiv Detail & Related papers (2023-08-17T12:26:11Z)
Masked Motion Predictors are Strong 3D Action Representation Learners [143.9677635274393]
In 3D human action recognition, limited supervised data makes it challenging to fully tap into the modeling potential of powerful networks such as transformers. We show that instead of following the prevalent pretext to perform masked self-component reconstruction in human joints, explicit contextual motion modeling is key to the success of learning effective feature representation for 3D action recognition.
arXiv Detail & Related papers (2023-08-14T11:56:39Z)
Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications. Traditional methods rely on hand-crafted features and machine learning techniques. We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z)
A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers. Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module. Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z)
3D Convolutional with Attention for Action Recognition [6.238518976312625]
Current action recognition methods use computationally expensive models for learning-temporal dependencies of the action. This paper proposes a deep neural network architecture for learning such dependencies consisting of a 3D convolutional layer, fully connected layers and attention layer. The method first learns spatial features and temporal of actions through 3D-CNN, and then the attention temporal mechanism helps the model to locate attention to essential features.
arXiv Detail & Related papers (2022-06-05T15:12:57Z)
Motion Prediction via Joint Dependency Modeling in Phase Space [40.54430409142653]
We introduce a novel convolutional neural model to leverage explicit prior knowledge of motion anatomy. We then propose a global optimization module that learns the implicit relationships between individual joint features. Our method is evaluated on large-scale 3D human motion benchmark datasets.
arXiv Detail & Related papers (2022-01-07T08:30:01Z)
Backprop-Free Reinforcement Learning with Active Neural Generative Coding [84.11376568625353]
We propose a computational framework for learning action-driven generative models without backpropagation of errors (backprop) in dynamic environments. We develop an intelligent agent that operates even with sparse rewards, drawing inspiration from the cognitive theory of planning as inference. The robust performance of our agent offers promising evidence that a backprop-free approach for neural inference and learning can drive goal-directed behavior.
arXiv Detail & Related papers (2021-07-10T19:02:27Z)
Progressive Self-Guided Loss for Salient Object Detection [102.35488902433896]
We present a progressive self-guided loss function to facilitate deep learning-based salient object detection in images. Our framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively.
arXiv Detail & Related papers (2021-01-07T07:33:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.