Gaze-Guided 3D Hand Motion Prediction for Detecting Intent in Egocentric Grasping Tasks
- URL: http://arxiv.org/abs/2504.01024v1
- Date: Thu, 27 Mar 2025 15:26:41 GMT
- Title: Gaze-Guided 3D Hand Motion Prediction for Detecting Intent in Egocentric Grasping Tasks
- Authors: Yufei He, Xucong Zhang, Arno H. A. Stienen,
- Abstract summary: We propose a novel approach that predicts future sequences of both hand poses and joint positions.<n>We use a vector-quantized variational autoencoder for robust hand pose encoding with an autoregressive generative transformer for effective hand motion sequence prediction.
- Score: 5.018156030818883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human intention detection with hand motion prediction is critical to drive the upper-extremity assistive robots in neurorehabilitation applications. However, the traditional methods relying on physiological signal measurement are restrictive and often lack environmental context. We propose a novel approach that predicts future sequences of both hand poses and joint positions. This method integrates gaze information, historical hand motion sequences, and environmental object data, adapting dynamically to the assistive needs of the patient without prior knowledge of the intended object for grasping. Specifically, we use a vector-quantized variational autoencoder for robust hand pose encoding with an autoregressive generative transformer for effective hand motion sequence prediction. We demonstrate the usability of these novel techniques in a pilot study with healthy subjects. To train and evaluate the proposed method, we collect a dataset consisting of various types of grasp actions on different objects from multiple subjects. Through extensive experiments, we demonstrate that the proposed method can successfully predict sequential hand movement. Especially, the gaze information shows significant enhancements in prediction capabilities, particularly with fewer input frames, highlighting the potential of the proposed method for real-world applications.
Related papers
- E-Motion: Future Motion Simulation via Event Sequence Diffusion [86.80533612211502]
Event-based sensors may potentially offer a unique opportunity to predict future motion with a level of detail and precision previously unachievable.
We propose to integrate the strong learning capacity of the video diffusion model with the rich motion information of an event camera as a motion simulation framework.
Our findings suggest a promising direction for future research in enhancing the interpretative power and predictive accuracy of computer vision systems.
arXiv Detail & Related papers (2024-10-11T09:19:23Z) - AdvMT: Adversarial Motion Transformer for Long-term Human Motion
Prediction [2.837740438355204]
We present the Adversarial Motion Transformer (AdvMT), a novel model that integrates a transformer-based motion encoder and a temporal continuity discriminator.
With adversarial training, our method effectively reduces the unwanted artifacts in predictions, thereby ensuring the learning of more realistic and fluid human motions.
arXiv Detail & Related papers (2024-01-10T09:15:50Z) - Uncovering the human motion pattern: Pattern Memory-based Diffusion
Model for Trajectory Prediction [45.77348842004666]
Motion Pattern Priors Memory Network is a memory-based method to uncover latent motion patterns in human behavior.
We introduce an addressing mechanism to retrieve the matched pattern and the potential target distributions for each prediction from the memory bank.
Experiments validate the effectiveness of our approach, achieving state-of-the-art trajectory prediction accuracy.
arXiv Detail & Related papers (2024-01-05T17:39:52Z) - GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion Prediction [10.982807572404166]
We present GazeMo - a novel gaze-guided denoising diffusion model to generate human motions.
Our method first uses a gaze encoder to extract the gaze and motion features respectively, then employs a graph attention network to fuse these features.
Our method outperforms the state-of-the-art methods by a large margin in terms of multi-modal final error.
arXiv Detail & Related papers (2023-12-19T12:10:12Z) - A Neuro-Symbolic Approach for Enhanced Human Motion Prediction [5.742409080817885]
We propose a neuro-symbolic approach for human motion prediction (NeuroSyM)
NeuroSyM weights differently the interactions in the neighbourhood by leveraging an intuitive technique for spatial representation called qualitative Trajectory Calculus (QTC)
Experimental results show that the NeuroSyM approach outperforms in most cases the baseline architectures in terms of prediction accuracy.
arXiv Detail & Related papers (2023-04-23T20:11:40Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - Investigating Pose Representations and Motion Contexts Modeling for 3D
Motion Prediction [63.62263239934777]
We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task.
We propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction.
Our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency.
arXiv Detail & Related papers (2021-12-30T10:45:22Z) - Generating Smooth Pose Sequences for Diverse Human Motion Prediction [90.45823619796674]
We introduce a unified deep generative network for both diverse and controllable motion prediction.
Our experiments on two standard benchmark datasets, Human3.6M and HumanEva-I, demonstrate that our approach outperforms the state-of-the-art baselines in terms of both sample diversity and accuracy.
arXiv Detail & Related papers (2021-08-19T00:58:00Z) - Probabilistic Human Motion Prediction via A Bayesian Neural Network [71.16277790708529]
We propose a probabilistic model for human motion prediction in this paper.
Our model could generate several future motions when given an observed motion sequence.
We extensively validate our approach on a large scale benchmark dataset Human3.6m.
arXiv Detail & Related papers (2021-07-14T09:05:33Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Temporally Guided Articulated Hand Pose Tracking in Surgical Videos [22.752654546694334]
Articulated hand pose tracking is an under-explored problem that carries the potential for use in an extensive number of applications.
We propose a novel hand pose estimation model, CondPose, which improves detection and tracking accuracy by incorporating a pose prior to its prediction.
arXiv Detail & Related papers (2021-01-12T03:44:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.