Environment-Specific People
- URL: http://arxiv.org/abs/2312.14579v1
- Date: Fri, 22 Dec 2023 10:15:15 GMT
- Title: Environment-Specific People
- Authors: Mirela Ostrek, Soubhik Sanyal, Carol O'Sullivan, Michael J. Black,
Justus Thies
- Abstract summary: We present ESP, a novel method for context-aware full-body generation.
ESP is conditioned on a 2D pose and contextual cues that are extracted from the environment photograph.
We show that ESP outperforms state-of-the-art on the task of contextual full-body generation.
- Score: 59.14959529735115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite significant progress in generative image synthesis and full-body
generation in particular, state-of-the-art methods are either
context-independent, overly reliant to text prompts, or bound to the curated
training datasets, such as fashion images with monotonous backgrounds. Here,
our goal is to generate people in clothing that is semantically appropriate for
a given scene. To this end, we present ESP, a novel method for context-aware
full-body generation, that enables photo-realistic inpainting of people into
existing "in-the-wild" photographs. ESP is conditioned on a 2D pose and
contextual cues that are extracted from the environment photograph and
integrated into the generation process. Our models are trained on a dataset
containing a set of in-the-wild photographs of people covering a wide range of
different environments. The method is analyzed quantitatively and
qualitatively, and we show that ESP outperforms state-of-the-art on the task of
contextual full-body generation.
Related papers
- Learning Complex Non-Rigid Image Edits from Multimodal Conditioning [18.500715348636582]
We focus on inserting a given human (specifically, a single image of a person) into a novel scene.
Our method, which builds on top of Stable Diffusion, yields natural looking images while being highly controllable with text and pose.
We demonstrate that identity preservation is a more challenging task in scenes "in-the-wild", and especially scenes where there is an interaction between persons and objects.
arXiv Detail & Related papers (2024-12-13T15:41:08Z) - Text2Place: Affordance-aware Text Guided Human Placement [26.041917073228483]
This work tackles the problem of realistic human insertion in a given background scene termed as textbfSemantic Human Placement.
For learning semantic masks, we leverage rich object-scene priors learned from the text-to-image generative models.
The proposed method can generate highly realistic scene compositions while preserving the background and subject identity.
arXiv Detail & Related papers (2024-07-22T08:00:06Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - Global Context-Aware Person Image Generation [24.317541784957285]
We propose a data-driven approach for context-aware person image generation.
In our method, the position, scale, and appearance of the generated person are semantically conditioned on the existing persons in the scene.
arXiv Detail & Related papers (2023-02-28T16:34:55Z) - Hallucinating Pose-Compatible Scenes [55.064949607528405]
We present a large-scale generative adversarial network for pose-conditioned scene generation.
We curating a massive meta-dataset containing over 19 million frames of humans in everyday environments.
We leverage our trained model for various applications: hallucinating pose-compatible scene(s) with or without humans, visualizing incompatible scenes and poses, placing a person from one generated image into another scene, and animating pose.
arXiv Detail & Related papers (2021-12-13T18:59:26Z) - HumanGAN: A Generative Model of Humans Images [78.6284090004218]
We present a generative model for images of dressed humans offering control over pose, local body part appearance and garment style.
Our model encodes part-based latent appearance vectors in a normalized pose-independent space and warps them to different poses, it preserves body and clothing appearance under varying posture.
arXiv Detail & Related papers (2021-03-11T19:00:38Z) - PISE: Person Image Synthesis and Editing with Decoupled GAN [64.70360318367943]
We propose PISE, a novel two-stage generative model for Person Image Synthesis and Editing.
For human pose transfer, we first synthesize a human parsing map aligned with the target pose to represent the shape of clothing.
To decouple the shape and style of clothing, we propose joint global and local per-region encoding and normalization.
arXiv Detail & Related papers (2021-03-06T04:32:06Z) - Wish You Were Here: Context-Aware Human Generation [100.51309746913512]
We present a novel method for inserting objects, specifically humans, into existing images.
Our method involves threeworks: the first generates the semantic map of the new person, given the pose of the other persons in the scene.
The second network renders the pixels of the novel person and its blending mask, based on specifications in the form of multiple appearance components.
A third network refines the generated face in order to match those of the target person.
arXiv Detail & Related papers (2020-05-21T14:09:14Z) - Adversarial Synthesis of Human Pose from Text [18.02001711736337]
This work focuses on synthesizing human poses from human-level text descriptions.
We propose a model that is based on a conditional generative adversarial network.
We show through qualitative and quantitative results that the model is capable of synthesizing plausible poses matching the given text.
arXiv Detail & Related papers (2020-05-01T12:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.