Global-local Motion Transformer for Unsupervised Skeleton-based Action
Learning
- URL: http://arxiv.org/abs/2207.06101v1
- Date: Wed, 13 Jul 2022 10:18:07 GMT
- Title: Global-local Motion Transformer for Unsupervised Skeleton-based Action
Learning
- Authors: Boeun Kim, Hyung Jin Chang, Jungho Kim, and Jin Young Choi
- Abstract summary: We propose a new transformer model for the task of unsupervised learning of skeleton motion sequences.
The proposed model successfully learns local dynamics of the joints and captures global context from the motion sequences.
- Score: 23.051184131833292
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose a new transformer model for the task of unsupervised learning of
skeleton motion sequences. The existing transformer model utilized for
unsupervised skeleton-based action learning is learned the instantaneous
velocity of each joint from adjacent frames without global motion information.
Thus, the model has difficulties in learning the attention globally over
whole-body motions and temporally distant joints. In addition, person-to-person
interactions have not been considered in the model. To tackle the learning of
whole-body motion, long-range temporal dynamics, and person-to-person
interactions, we design a global and local attention mechanism, where, global
body motions and local joint motions pay attention to each other. In addition,
we propose a novel pretraining strategy, multi-interval pose displacement
prediction, to learn both global and local attention in diverse time ranges.
The proposed model successfully learns local dynamics of the joints and
captures global context from the motion sequences. Our model outperforms
state-of-the-art models by notable margins in the representative benchmarks.
Codes are available at https://github.com/Boeun-Kim/GL-Transformer.
Related papers
- Interactive incremental learning of generalizable skills with local trajectory modulation [14.416251854298409]
We propose an interactive imitation learning framework that simultaneously leverages local and global modulations of trajectory distributions.
Our approach exploits the concept of via-points to incrementally and interactively 1) improve the model accuracy locally, 2) add new objects to the task during execution and 3) extend the skill into regions where demonstrations were not provided.
arXiv Detail & Related papers (2024-09-09T14:22:19Z) - Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics [67.97235923372035]
We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics.
At test time, given a single image and a sparse set of motion trajectories, Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions.
arXiv Detail & Related papers (2024-08-08T17:59:38Z) - Joint-Motion Mutual Learning for Pose Estimation in Videos [21.77871402339573]
Human pose estimation in videos has long been a compelling yet challenging task within the realm of computer vision.
Recent methods strive to integrate multi-frame visual features generated by a backbone network for pose estimation.
We propose a novel joint-motion mutual learning framework for pose estimation.
arXiv Detail & Related papers (2024-08-05T07:37:55Z) - Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation [52.87672306545577]
Existing motion generation methods primarily focus on the direct synthesis of global motions.
We propose the local action-guided motion diffusion model, which facilitates global motion generation by utilizing local actions as fine-grained control signals.
Our method provides flexibility in seamlessly combining various local actions and continuous guiding weight adjustment.
arXiv Detail & Related papers (2024-07-15T08:35:00Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - Persistent-Transient Duality: A Multi-mechanism Approach for Modeling
Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts.
In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline.
This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z) - GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction [61.833152949826946]
We propose a novel Ground-aware Motion Model for 3D Human Motion Reconstruction, named GraMMaR.
GraMMaR learns the distribution of transitions in both pose and interaction between every joint and ground plane at each time step of a motion sequence.
It is trained to explicitly promote consistency between the motion and distance change towards the ground.
arXiv Detail & Related papers (2023-06-29T07:22:20Z) - Global-to-Local Modeling for Video-based 3D Human Pose and Shape
Estimation [53.04781510348416]
Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness.
We propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT)
Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M.
arXiv Detail & Related papers (2023-03-26T14:57:49Z) - SoMoFormer: Social-Aware Motion Transformer for Multi-Person Motion
Prediction [10.496276090281825]
We propose a novel Social-Aware Motion Transformer (SoMoFormer) to model individual motion and social interactions in a joint manner.
SoMoFormer extracts motion features from sub-sequences in displacement trajectory space to learn both local and global pose dynamics for each individual.
In addition, we devise a novel social-aware motion attention mechanism in SoMoFormer to further optimize dynamics representations and capture interaction dependencies simultaneously.
arXiv Detail & Related papers (2022-08-19T08:57:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.