QPoser: Quantized Explicit Pose Prior Modeling for Controllable Pose
Generation
- URL: http://arxiv.org/abs/2312.01104v1
- Date: Sat, 2 Dec 2023 10:44:34 GMT
- Title: QPoser: Quantized Explicit Pose Prior Modeling for Controllable Pose
Generation
- Authors: Yumeng Li, Yaoxiang Ding, Zhong Ren, Kun Zhou
- Abstract summary: A desirable explicit pose prior model should satisfy three desirable abilities.
QPoser is a controllable explicit pose prior model which guarantees correctness and expressiveness.
QPoser significantly outperforms state-of-the-art approaches in representing expressive and correct poses.
- Score: 27.93210245241248
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explicit pose prior models compress human poses into latent representations
for using in pose-related downstream tasks. A desirable explicit pose prior
model should satisfy three desirable abilities: 1) correctness, i.e. ensuring
to generate physically possible poses; 2) expressiveness, i.e. ensuring to
preserve details in generation; 3) controllability, meaning that generation
from reference poses and explicit instructions should be convenient. Existing
explicit pose prior models fail to achieve all of three properties, in special
controllability. To break this situation, we propose QPoser, a highly
controllable explicit pose prior model which guarantees correctness and
expressiveness. In QPoser, a multi-head vector quantized autoencoder (MS-VQVAE)
is proposed for obtaining expressive and distributed pose representations.
Furthermore, a global-local feature integration mechanism (GLIF-AE) is utilized
to disentangle the latent representation and integrate full-body information
into local-joint features. Experimental results show that QPoser significantly
outperforms state-of-the-art approaches in representing expressive and correct
poses, meanwhile is easily to be used for detailed conditional generation from
reference poses and prompting instructions.
Related papers
- UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing [79.68232381605661]
We present UniPose, a framework to comprehend, generate, and edit human poses across various modalities.
Specifically, we apply a pose tokenizer to convert 3D poses into discrete pose tokens, enabling seamless integration into the LLM within a unified vocabulary.
Benefiting from a unified learning strategy, UniPose effectively transfers knowledge across different pose-relevant tasks, adapts to unseen tasks, and exhibits extended capabilities.
arXiv Detail & Related papers (2024-11-25T08:06:30Z) - Neural Pose Representation Learning for Generating and Transferring Non-Rigid Object Poses [11.614034196935899]
We propose a novel method for learning representations of poses for 3D deformable objects.
It specializes in 1) disentangling pose information from the object's identity, 2) facilitating the learning of pose variations, and 3) transferring pose information to other object identities.
Based on these properties, our method enables the generation of 3D deformable objects with diversity in both identities and poses.
arXiv Detail & Related papers (2024-06-14T05:33:01Z) - Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation [32.190055780969466]
Stable-Pose is a novel adapter model that introduces a coarse-to-fine attention masking strategy into a vision Transformer.
We leverage the query-key self-attention mechanism of ViTs to explore the interconnections among different anatomical parts in human pose skeletons.
Stable-Pose achieved an AP score of 57.1 in the LAION-Human dataset, marking around 13% improvement over the established technique ControlNet.
arXiv Detail & Related papers (2024-06-04T16:54:28Z) - ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation [54.86887812687023]
Most 3D-HPE methods rely on regression models, which assume a one-to-one mapping between inputs and outputs.
We propose ManiPose, a novel manifold-constrained multi-hypothesis model capable of proposing multiple candidate 3D poses for each 2D input.
Unlike previous multi-hypothesis approaches, our solution is completely supervised and does not rely on complex generative models.
arXiv Detail & Related papers (2023-12-11T13:50:10Z) - Learning 3D-aware Image Synthesis with Unknown Pose Distribution [68.62476998646866]
Existing methods for 3D-aware image synthesis largely depend on the 3D pose distribution pre-estimated on the training set.
This work proposes PoF3D that frees generative radiance fields from the requirements of 3D pose priors.
arXiv Detail & Related papers (2023-01-18T18:47:46Z) - PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for
Human Pose Estimation [40.50255017107963]
We propose Pose Transformation (PoseTrans) to create new training samples that have diverse poses.
We also propose Pose Clustering Module (PCM) to measure the pose rarity and select the "rarest" poses to help balance the long-tailed distribution.
Our method is efficient and simple to implement, which can be easily integrated into the training pipeline of existing pose estimation models.
arXiv Detail & Related papers (2022-08-16T14:03:01Z) - Pose-NDF: Modeling Human Pose Manifolds with Neural Distance Fields [47.62275563070933]
We present a continuous model for plausible human poses based on neural distance fields (NDFs)
Pose-NDF learns a manifold of plausible poses as the zero level set of a neural implicit function.
It can be used to generate more diverse poses by random sampling and projection than VAE-based methods.
arXiv Detail & Related papers (2022-07-27T21:46:47Z) - PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose
Estimation [83.50127973254538]
Existing 3D human pose estimators suffer poor generalization performance to new datasets.
We present PoseAug, a new auto-augmentation framework that learns to augment the available training poses towards a greater diversity.
arXiv Detail & Related papers (2021-05-06T06:57:42Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.