HVIS: A Human-like Vision and Inference System for Human Motion Prediction
- URL: http://arxiv.org/abs/2502.16913v1
- Date: Mon, 24 Feb 2025 07:18:37 GMT
- Title: HVIS: A Human-like Vision and Inference System for Human Motion Prediction
- Authors: Kedi Lyu, Haipeng Chen, Zhenguang Liu, Yifang Yin, Yukang Lin, Yingying Jiao,
- Abstract summary: We propose the Human-like Vision and Inference System (HVIS) for human motion prediction.<n>HVIS comprises two components: the human-like vision encodetemporal (HVE) module and the human-like motion inference (HMI) module.<n>We show that our method achieves markedly new state-of-the-art performance, significantly outperforming existing methods by 19.8% on Human3.6M, 15.7% on CMU Mocap, and 11.1% on G3D.
- Score: 16.315519892850077
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Grasping the intricacies of human motion, which involve perceiving spatio-temporal dependence and multi-scale effects, is essential for predicting human motion. While humans inherently possess the requisite skills to navigate this issue, it proves to be markedly more challenging for machines to emulate. To bridge the gap, we propose the Human-like Vision and Inference System (HVIS) for human motion prediction, which is designed to emulate human observation and forecast future movements. HVIS comprises two components: the human-like vision encode (HVE) module and the human-like motion inference (HMI) module. The HVE module mimics and refines the human visual process, incorporating a retina-analog component that captures spatiotemporal information separately to avoid unnecessary crosstalk. Additionally, a visual cortex-analogy component is designed to hierarchically extract and treat complex motion features, focusing on both global and local features of human poses. The HMI is employed to simulate the multi-stage learning model of the human brain. The spontaneous learning network simulates the neuronal fracture generation process for the adversarial generation of future motions. Subsequently, the deliberate learning network is optimized for hard-to-train joints to prevent misleading learning. Experimental results demonstrate that our method achieves new state-of-the-art performance, significantly outperforming existing methods by 19.8% on Human3.6M, 15.7% on CMU Mocap, and 11.1% on G3D.
Related papers
- Reinforcement learning-based motion imitation for physiologically plausible musculoskeletal motor control [47.423243831156285]
We present a model-free motion imitation framework (KINESIS) to advance the understanding of muscle-based motor control.
We demonstrate that KINESIS achieves strong imitation performance on 1.9 hours of motion capture data.
KINESIS generates muscle activity patterns that correlate well with human EMG activity.
arXiv Detail & Related papers (2025-03-18T18:37:49Z) - MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds [20.83684434910106]
We present MoManifold, a novel human motion prior, which models plausible human motion in continuous high-dimensional motion space.
Specifically, we propose novel decoupled joint acceleration to model human dynamics from existing limited motion data.
Extensive experiments demonstrate that MoManifold outperforms existing SOTAs as a prior in several downstream tasks.
arXiv Detail & Related papers (2024-09-01T15:00:16Z) - SMART: Scene-motion-aware human action recognition framework for mental disorder group [16.60713558596286]
We propose to build a vision-based Human Action Recognition dataset including abnormal actions often occurring in the mental disorder group.
We then introduce a novel Scene-Motion-aware Action Recognition framework, named SMART, consisting of two technical modules.
The effectiveness of our proposed method has been validated on our self-collected HAR dataset (HAD), achieving 94.9% and 93.1% accuracy in un-seen subjects and scenes, and outperforming state-of-the-art approaches by 6.5% and 13.2%, respectively.
arXiv Detail & Related papers (2024-06-07T05:29:42Z) - HINT: Learning Complete Human Neural Representations from Limited Viewpoints [69.76947323932107]
We propose a NeRF-based algorithm able to learn a detailed and complete human model from limited viewing angles.
As a result, our method can reconstruct complete humans even from a few viewing angles, increasing performance by more than 15% PSNR.
arXiv Detail & Related papers (2024-05-30T05:43:09Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models [56.257840490146]
ConCue is a novel approach for improving visual feature extraction in HOI detection.
We develop a transformer-based feature extraction module with a multi-tower architecture that integrates contextual cues into both instance and interaction detectors.
arXiv Detail & Related papers (2023-11-26T09:11:32Z) - Modelling Human Visual Motion Processing with Trainable Motion Energy
Sensing and a Self-attention Network [1.9458156037869137]
We propose an image-computable model of human motion perception by bridging the gap between biological and computer vision models.
This model architecture aims to capture the computations in V1-MT, the core structure for motion perception in the biological visual system.
In silico neurophysiology reveals that our model's unit responses are similar to mammalian neural recordings regarding motion pooling and speed tuning.
arXiv Detail & Related papers (2023-05-16T04:16:07Z) - Deep state-space modeling for explainable representation, analysis, and
generation of professional human poses [0.0]
This paper introduces three novel methods for creating explainable representations of human movement.
The trained models are used for the full-body dexterity analysis of expert professionals.
arXiv Detail & Related papers (2023-04-13T08:13:10Z) - Task-Oriented Human-Object Interactions Generation with Implicit Neural
Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations.
Our method generates continuous motions that are parameterized only by the temporal coordinate.
This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - Overcoming the Domain Gap in Neural Action Representations [60.47807856873544]
3D pose data can now be reliably extracted from multi-view video sequences without manual intervention.
We propose to use it to guide the encoding of neural action representations together with a set of neural and behavioral augmentations.
To reduce the domain gap, during training, we swap neural and behavioral data across animals that seem to be performing similar actions.
arXiv Detail & Related papers (2021-12-02T12:45:46Z) - 3D Human motion anticipation and classification [8.069283749930594]
We propose a novel sequence-to-sequence model for human motion prediction and feature learning.
Our model learns to predict multiple future sequences of human poses from the same input sequence.
We show that it takes less than half the number of epochs to train an activity recognition network by using the feature learned from the discriminator.
arXiv Detail & Related papers (2020-12-31T00:19:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.