GRPose: Learning Graph Relations for Human Image Generation with Pose Priors
- URL: http://arxiv.org/abs/2408.16540v1
- Date: Thu, 29 Aug 2024 13:58:34 GMT
- Title: GRPose: Learning Graph Relations for Human Image Generation with Pose Priors
- Authors: Xiangchen Yin, Donglin Di, Lei Fan, Hao Li, Chen Wei, Xiaofei Gou, Yang Song, Xiao Sun, Xun Yang,
- Abstract summary: We propose a framework delving into the graph relations of pose priors to provide control information for human image generation.
Our model achieves superior performance, with a 9.98% increase in pose average precision compared to the latest benchmark model.
- Score: 21.971188335727074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent methods using diffusion models have made significant progress in human image generation with various additional controls such as pose priors. However, existing approaches still struggle to generate high-quality images with consistent pose alignment, resulting in unsatisfactory outputs. In this paper, we propose a framework delving into the graph relations of pose priors to provide control information for human image generation. The main idea is to establish a graph topological structure between the pose priors and latent representation of diffusion models to capture the intrinsic associations between different pose parts. A Progressive Graph Integrator (PGI) is designed to learn the spatial relationships of the pose priors with the graph structure, adopting a hierarchical strategy within an Adapter to gradually propagate information across different pose parts. A pose perception loss is further introduced based on a pretrained pose estimation network to minimize the pose differences. Extensive qualitative and quantitative experiments conducted on the Human-Art and LAION-Human datasets demonstrate that our model achieves superior performance, with a 9.98% increase in pose average precision compared to the latest benchmark model. The code is released on *******.
Related papers
- PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference [62.72779589895124]
We make the first attempt to align diffusion models for image inpainting with human aesthetic standards via a reinforcement learning framework.
We train a reward model with a dataset we construct, consisting of nearly 51,000 images annotated with human preferences.
Experiments on inpainting comparison and downstream tasks, such as image extension and 3D reconstruction, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-29T11:49:39Z) - Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models [65.82564074712836]
We introduce DIFfusionHOI, a new HOI detector shedding light on text-to-image diffusion models.
We first devise an inversion-based strategy to learn the expression of relation patterns between humans and objects in embedding space.
These learned relation embeddings then serve as textual prompts, to steer diffusion models generate images that depict specific interactions.
arXiv Detail & Related papers (2024-10-26T12:00:33Z) - InvGAN: Invertible GANs [88.58338626299837]
InvGAN, short for Invertible GAN, successfully embeds real images to the latent space of a high quality generative model.
This allows us to perform image inpainting, merging, and online data augmentation.
arXiv Detail & Related papers (2021-12-08T21:39:00Z) - MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human
Motion Prediction [34.565986275769745]
We propose a novel Multi-Scale Residual Graph Convolution Network (MSR-GCN) for human pose prediction task.
Our proposed approach is evaluated on two standard benchmark datasets, i.e., the Human3.6M dataset and the CMU Mocap dataset.
arXiv Detail & Related papers (2021-08-16T15:26:23Z) - Conditional Directed Graph Convolution for 3D Human Pose Estimation [23.376538132362498]
Graph convolutional networks have significantly improved 3D human pose estimation by representing the human skeleton as an undirected graph.
This paper proposes to represent the human skeleton as a directed graph with the joints as nodes and bones as edges that are directed from parent joints to child joints.
arXiv Detail & Related papers (2021-07-16T09:50:40Z) - Learning Dynamics via Graph Neural Networks for Human Pose Estimation
and Tracking [98.91894395941766]
We propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame.
Specifically, we derive this prediction of dynamics through a graph neural network(GNN) that explicitly accounts for both spatial-temporal and visual information.
Experiments on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed method achieves results superior to the state of the art on both human pose estimation and tracking tasks.
arXiv Detail & Related papers (2021-06-07T16:36:50Z) - 3D Human Pose Regression using Graph Convolutional Network [68.8204255655161]
We propose a graph convolutional network named PoseGraphNet for 3D human pose regression from 2D poses.
Our model's performance is close to the state-of-the-art, but with much fewer parameters.
arXiv Detail & Related papers (2021-05-21T14:41:31Z) - Structure-aware Person Image Generation with Pose Decomposition and
Semantic Correlation [29.727033198797518]
We propose a structure-aware flow based method for high-quality person image generation.
We decompose the human body into different semantic parts and apply different networks to predict the flow fields for these parts separately.
Our method can generate high-quality results under large pose discrepancy and outperforms state-of-the-art methods in both qualitative and quantitative comparisons.
arXiv Detail & Related papers (2021-02-05T03:07:57Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - MirrorNet: A Deep Bayesian Approach to Reflective 2D Pose Estimation
from Human Images [42.27703025887059]
The main problems with the standard supervised approach are that it often yields anatomically implausible poses.
We propose a semi-supervised method that can make effective use of images with and without pose annotations.
The results of experiments show that the proposed reflective architecture makes estimated poses anatomically plausible.
arXiv Detail & Related papers (2020-04-08T05:02:48Z) - RePose: Learning Deep Kinematic Priors for Fast Human Pose Estimation [17.0630180888369]
We propose a novel efficient and lightweight model for human pose estimation from a single image.
Our model is designed to achieve competitive results at a fraction of the number of parameters and computational cost of various state-of-the-art methods.
arXiv Detail & Related papers (2020-02-10T16:44:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.