Related papers: PersonificationNet: Making customized subject act like a person

PersonificationNet: Making customized subject act like a person

URL: http://arxiv.org/abs/2407.09057v1
Date: Fri, 12 Jul 2024 07:27:07 GMT
Title: PersonificationNet: Making customized subject act like a person
Authors: Tianchu Guo, Pengyu Li, Biao Wang, Xiansheng Hua,
Abstract summary: We propose a PersonificationNet, which can control the specified subject such as a cartoon character or plush toy to act the same pose as a given referenced person's image. Specifically, first, the customized branch mimics specified subject appearance. Second, the pose condition branch transfers the body structure information from the human to variant instances. Last, the structure alignment module bridges the structure gap between human and specified subject in the inference stage.
Score: 39.359589723267696
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently customized generation has significant potential, which uses as few as 3-5 user-provided images to train a model to synthesize new images of a specified subject. Though subsequent applications enhance the flexibility and diversity of customized generation, fine-grained control over the given subject acting like the person's pose is still lack of study. In this paper, we propose a PersonificationNet, which can control the specified subject such as a cartoon character or plush toy to act the same pose as a given referenced person's image. It contains a customized branch, a pose condition branch and a structure alignment module. Specifically, first, the customized branch mimics specified subject appearance. Second, the pose condition branch transfers the body structure information from the human to variant instances. Last, the structure alignment module bridges the structure gap between human and specified subject in the inference stage. Experimental results show our proposed PersonificationNet outperforms the state-of-the-art methods.

Related papers

YoChameleon: Personalized Vision and Language Generation [54.11098551685136]
Yo'Chameleon is the first attempt to study personalization for large multimodal models. It embeds subject-specific information to answer questions about the subject and recreate pixel-level details to produce images of the subject in new contexts. It is trained with (i) a self-prompting optimization mechanism to balance performance across multiple modalities, and (ii) a soft-positive" image generation approach to enhance image quality in a few-shot setting.
arXiv Detail & Related papers (2025-04-29T17:59:57Z)
FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images [74.86864398919467]
We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images. We learn a universal prior from over a thousand clothed humans to achieve instant feedforward generation and zero-shot generalization. Our method generates more authentic reconstruction and animation than state-of-the-arts, and can be directly generalized to inputs from casually taken phone photos.
arXiv Detail & Related papers (2025-03-24T23:20:47Z)
Learning Complex Non-Rigid Image Edits from Multimodal Conditioning [18.500715348636582]
We focus on inserting a given human (specifically, a single image of a person) into a novel scene. Our method, which builds on top of Stable Diffusion, yields natural looking images while being highly controllable with text and pose. We demonstrate that identity preservation is a more challenging task in scenes "in-the-wild", and especially scenes where there is an interaction between persons and objects.
arXiv Detail & Related papers (2024-12-13T15:41:08Z)
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models [51.1034358143232]
We introduce component-controllable personalization, a new task that allows users to customize and reconfigure individual components within concepts. This task faces two challenges: semantic pollution, where undesirable elements distort the concept, and semantic imbalance, which leads to disproportionate learning of the target concept and component. We design MagicTailor, a framework that uses Dynamic Masked Degradation to adaptively perturb unwanted visual semantics and Dual-Stream Balancing for more balanced learning of desired visual semantics.
arXiv Detail & Related papers (2024-10-17T09:22:53Z)
GroundingBooth: Grounding Text-to-Image Customization [17.185571339157075]
We introduce GroundingBooth, a framework that achieves zero-shot instance-level spatial grounding on both foreground subjects and background objects. Our proposed text-image grounding module and masked cross-attention layer allow us to generate personalized images with both accurate layout alignment and identity preservation.
arXiv Detail & Related papers (2024-09-13T03:40:58Z)
From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation [19.096741614175524]
Parts2Whole is a novel framework designed for generating customized portraits from multiple reference images. We first develop a semantic-aware appearance encoder to retain details of different human parts. Second, our framework supports multi-image conditioned generation through a shared self-attention mechanism.
arXiv Detail & Related papers (2024-04-23T17:56:08Z)
Training-Free Consistent Text-to-Image Generation [80.4814768762066]
Text-to-image models can portray the same subject across diverse prompts. Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects. We present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model.
arXiv Detail & Related papers (2024-02-05T18:42:34Z)
Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization [56.12990759116612]
Pick-and-Draw is a training-free semantic guidance approach to boost identity consistency and generative diversity for personalization methods. The proposed approach can be applied to any personalized diffusion models and requires as few as a single reference image.
arXiv Detail & Related papers (2024-01-30T05:56:12Z)
InstructBooth: Instruction-following Personalized Text-to-Image Generation [30.89054609185801]
InstructBooth is a novel method designed to enhance image-text alignment in personalized text-to-image models. Our approach first personalizes text-to-image models with a small number of subject-specific images using a unique identifier. After personalization, we fine-tune personalized text-to-image models using reinforcement learning to maximize a reward that quantifies image-text alignment.
arXiv Detail & Related papers (2023-12-04T20:34:46Z)
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation [26.748667878221568]
We present a new approach for "personalization" of text-to-image models. We fine-tune a pretrained text-to-image model to bind a unique identifier with that specific subject. The unique identifier can then be used to synthesize fully photorealistic-novel images of the subject contextualized in different scenes.
arXiv Detail & Related papers (2022-08-25T17:45:49Z)
Pose-Guided Human Animation from a Single Image in the Wild [83.86903892201656]
We present a new pose transfer method for synthesizing a human animation from a single image of a person controlled by a sequence of body poses. Existing pose transfer methods exhibit significant visual artifacts when applying to a novel scene. We design a compositional neural network that predicts the silhouette, garment labels, and textures. We are able to synthesize human animations that can preserve the identity and appearance of the person in a temporally coherent way without any fine-tuning of the network on the testing scene.
arXiv Detail & Related papers (2020-12-07T15:38:29Z)
Person image generation with semantic attention network for person re-identification [9.30413920076019]
We propose a novel person pose-guided image generation method, which is called the semantic attention network. The network consists of several semantic attention blocks, where each block attends to preserve and update the pose code and the clothing textures. Compared with other methods, our network can characterize better body shape and keep clothing attributes, simultaneously.
arXiv Detail & Related papers (2020-08-18T12:18:51Z)
Generating Person Images with Appearance-aware Pose Stylizer [66.44220388377596]
We present a novel end-to-end framework to generate realistic person images based on given person poses and appearances. The core of our framework is a novel generator called Appearance-aware Pose Stylizer (APS) which generates human images by coupling the target pose with the conditioned person appearance progressively.
arXiv Detail & Related papers (2020-07-17T15:58:05Z)
Wish You Were Here: Context-Aware Human Generation [100.51309746913512]
We present a novel method for inserting objects, specifically humans, into existing images. Our method involves threeworks: the first generates the semantic map of the new person, given the pose of the other persons in the scene. The second network renders the pixels of the novel person and its blending mask, based on specifications in the form of multiple appearance components. A third network refines the generated face in order to match those of the target person.
arXiv Detail & Related papers (2020-05-21T14:09:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.