Spatial-Related Sensors Matters: 3D Human Motion Reconstruction Assisted
with Textual Semantics
- URL: http://arxiv.org/abs/2401.05412v1
- Date: Wed, 27 Dec 2023 04:21:45 GMT
- Title: Spatial-Related Sensors Matters: 3D Human Motion Reconstruction Assisted
with Textual Semantics
- Authors: Xueyuan Yang and Chao Yao and Xiaojuan Ban
- Abstract summary: Leveraging wearable devices for motion reconstruction has emerged as an economical and viable technique.
In this paper, we explore the spatial importance of multiple sensors, supervised by text that describes specific actions.
With textual supervision, our method not only differentiates between ambiguous actions such as sitting and standing but also produces more precise and natural motion.
- Score: 4.9493039356268875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Leveraging wearable devices for motion reconstruction has emerged as an
economical and viable technique. Certain methodologies employ sparse Inertial
Measurement Units (IMUs) on the human body and harness data-driven strategies
to model human poses. However, the reconstruction of motion based solely on
sparse IMUs data is inherently fraught with ambiguity, a consequence of
numerous identical IMU readings corresponding to different poses. In this
paper, we explore the spatial importance of multiple sensors, supervised by
text that describes specific actions. Specifically, uncertainty is introduced
to derive weighted features for each IMU. We also design a Hierarchical
Temporal Transformer (HTT) and apply contrastive learning to achieve precise
temporal and feature alignment of sensor data with textual semantics.
Experimental results demonstrate our proposed approach achieves significant
improvements in multiple metrics compared to existing methods. Notably, with
textual supervision, our method not only differentiates between ambiguous
actions such as sitting and standing but also produces more precise and natural
motion.
Related papers
- Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for
Enhanced Human Pose Estimation with Sparse Inertial Sensors [17.3834029178939]
This paper introduces a novel human pose estimation approach using sparse inertial sensors.
It leverages a diverse array of real inertial motion capture data from different skeleton formats to improve motion diversity and model generalization.
The approach demonstrates superior performance over state-of-the-art models across five public datasets, notably reducing pose error by 19% on the DIP-IMU dataset.
arXiv Detail & Related papers (2023-12-02T13:17:10Z) - SemanticBoost: Elevating Motion Generation with Augmented Textual Cues [73.83255805408126]
Our framework comprises a Semantic Enhancement module and a Context-Attuned Motion Denoiser (CAMD)
The CAMD approach provides an all-encompassing solution for generating high-quality, semantically consistent motion sequences.
Our experimental results demonstrate that SemanticBoost, as a diffusion-based method, outperforms auto-regressive-based techniques.
arXiv Detail & Related papers (2023-10-31T09:58:11Z) - DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions.
We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z) - DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion [10.439802168557513]
Motion capture from a limited number of body-worn sensors has important applications in health, human performance, and entertainment.
Recent work has focused on accurately reconstructing whole-body motion from a specific sensor configuration using six IMUs.
We propose a single diffusion model, DiffusionPoser, which reconstructs human motion in real-time from an arbitrary combination of sensors.
arXiv Detail & Related papers (2023-08-31T12:36:50Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications.
Traditional methods rely on hand-crafted features and machine learning techniques.
We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z) - Mutual Information-Based Temporal Difference Learning for Human Pose
Estimation in Video [16.32910684198013]
We present a novel multi-frame human pose estimation framework, which employs temporal differences across frames to model dynamic contexts.
To be specific, we design a multi-stage entangled learning sequences conditioned on multi-stage differences to derive informative motion representation sequences.
These place us to rank No.1 in the Crowd Pose Estimation in Complex Events Challenge on benchmark HiEve.
arXiv Detail & Related papers (2023-03-15T09:29:03Z) - Reconfigurable Data Glove for Reconstructing Physical and Virtual Grasps [100.72245315180433]
We present a reconfigurable data glove design to capture different modes of human hand-object interactions.
The glove operates in three modes for various downstream tasks with distinct features.
We evaluate the system's three modes by (i) recording hand gestures and associated forces, (ii) improving manipulation fluency in VR, and (iii) producing realistic simulation effects of various tool uses.
arXiv Detail & Related papers (2023-01-14T05:35:50Z) - Transformer Inertial Poser: Attention-based Real-time Human Motion
Reconstruction from Sparse IMUs [79.72586714047199]
We propose an attention-based deep learning method to reconstruct full-body motion from six IMU sensors in real-time.
Our method achieves new state-of-the-art results both quantitatively and qualitatively, while being simple to implement and smaller in size.
arXiv Detail & Related papers (2022-03-29T16:24:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.