Related papers: From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation

From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation

URL: http://arxiv.org/abs/2404.15267v1
Date: Tue, 23 Apr 2024 17:56:08 GMT
Title: From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation
Authors: Zehuan Huang, Hongxing Fan, Lipeng Wang, Lu Sheng,
Abstract summary: Parts2Whole is a novel framework designed for generating customized portraits from multiple reference images. We first develop a semantic-aware appearance encoder to retain details of different human parts. Second, our framework supports multi-image conditioned generation through a shared self-attention mechanism.
Score: 19.096741614175524
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in controllable human image generation have led to zero-shot generation using structural signals (e.g., pose, depth) or facial appearance. Yet, generating human images conditioned on multiple parts of human appearance remains challenging. Addressing this, we introduce Parts2Whole, a novel framework designed for generating customized portraits from multiple reference images, including pose images and various aspects of human appearance. To achieve this, we first develop a semantic-aware appearance encoder to retain details of different human parts, which processes each image based on its textual label to a series of multi-scale feature maps rather than one image token, preserving the image dimension. Second, our framework supports multi-image conditioned generation through a shared self-attention mechanism that operates across reference and target features during the diffusion process. We enhance the vanilla attention mechanism by incorporating mask information from the reference human images, allowing for the precise selection of any part. Extensive experiments demonstrate the superiority of our approach over existing alternatives, offering advanced capabilities for multi-part controllable human image customization. See our project page at https://huanngzh.github.io/Parts2Whole/.

Related papers

ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions [74.30040551058319]
ComposeAnyone is a controllable layout-to-human generation method with decoupled multimodal conditions. Our dataset provides decoupled text and reference image annotations for different components of each human image. Experiments on multiple datasets demonstrate that ComposeAnyone generates human images with better alignment to given layouts.
arXiv Detail & Related papers (2025-01-21T14:32:47Z)
Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis [7.099258248662009]
Text-to-image (T2I) models have significantly advanced the development of artificial intelligence. However, existing T2I-based methods often struggle to accurately reproduce the appearance of individuals from a reference image. We leverage the pre-trained UNet from Stable Diffusion to incorporate the target face image directly into the generation process.
arXiv Detail & Related papers (2024-09-27T19:31:04Z)
Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition. We propose augmenting the input image with masks that indicate the presence of target concepts. We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z)
UMFuse: Unified Multi View Fusion for Human Editing applications [36.94334399493266]
We design a multi-view fusion network that takes the pose key points and texture from multiple source images. We show the application of our network on two newly proposed tasks - Multi-view human reposing and Mix&Match Human Image generation.
arXiv Detail & Related papers (2022-11-17T05:09:58Z)
HumanDiffusion: a Coarse-to-Fine Alignment Diffusion Framework for Controllable Text-Driven Person Image Generation [73.3790833537313]
Controllable person image generation promotes a wide range of applications such as digital human interaction and virtual try-on. We propose HumanDiffusion, a coarse-to-fine alignment diffusion framework, for text-driven person image generation.
arXiv Detail & Related papers (2022-11-11T14:30:34Z)
Neural Novel Actor: Learning a Generalized Animatable Neural Representation for Human Actors [98.24047528960406]
We propose a new method for learning a generalized animatable neural representation from a sparse set of multi-view imagery of multiple persons. The learned representation can be used to synthesize novel view images of an arbitrary person from a sparse set of cameras, and further animate them with the user's pose control.
arXiv Detail & Related papers (2022-08-25T07:36:46Z)
T-Person-GAN: Text-to-Person Image Generation with Identity-Consistency and Manifold Mix-Up [16.165889084870116]
We present an end-to-end approach to generate high-resolution person images conditioned on texts only. We develop an effective generative model to produce person images with two novel mechanisms.
arXiv Detail & Related papers (2022-08-18T07:41:02Z)
Neural Rendering of Humans in Novel View and Pose from Monocular Video [68.37767099240236]
We introduce a new method that generates photo-realistic humans under novel views and poses given a monocular video as input. Our method significantly outperforms existing approaches under unseen poses and novel views given monocular videos as input.
arXiv Detail & Related papers (2022-04-04T03:09:20Z)
HumanGAN: A Generative Model of Humans Images [78.6284090004218]
We present a generative model for images of dressed humans offering control over pose, local body part appearance and garment style. Our model encodes part-based latent appearance vectors in a normalized pose-independent space and warps them to different poses, it preserves body and clothing appearance under varying posture.
arXiv Detail & Related papers (2021-03-11T19:00:38Z)
MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation [13.06676286691587]
Pose-guided person image generation usually involves using paired source-target images to supervise the training. We propose a novel multi-level statistics transfer model, which disentangles and transfers multi-level appearance features from person images. Our approach allows for flexible manipulation of person appearance and pose properties to perform pose transfer and clothes style transfer tasks.
arXiv Detail & Related papers (2020-11-18T04:38:48Z)
Generating Person Images with Appearance-aware Pose Stylizer [66.44220388377596]
We present a novel end-to-end framework to generate realistic person images based on given person poses and appearances. The core of our framework is a novel generator called Appearance-aware Pose Stylizer (APS) which generates human images by coupling the target pose with the conditioned person appearance progressively.
arXiv Detail & Related papers (2020-07-17T15:58:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.