Scene Aware Person Image Generation through Global Contextual
Conditioning
- URL: http://arxiv.org/abs/2206.02717v1
- Date: Mon, 6 Jun 2022 16:18:15 GMT
- Title: Scene Aware Person Image Generation through Global Contextual
Conditioning
- Authors: Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal, Michael
Blumenstein
- Abstract summary: We propose a novel pipeline to generate and insert contextually relevant person images into an existing scene.
More specifically, we aim to insert a person such that the location, pose, and scale of the person being inserted blends in with the existing persons in the scene.
- Score: 24.317541784957285
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Person image generation is an intriguing yet challenging problem. However,
this task becomes even more difficult under constrained situations. In this
work, we propose a novel pipeline to generate and insert contextually relevant
person images into an existing scene while preserving the global semantics.
More specifically, we aim to insert a person such that the location, pose, and
scale of the person being inserted blends in with the existing persons in the
scene. Our method uses three individual networks in a sequential pipeline. At
first, we predict the potential location and the skeletal structure of the new
person by conditioning a Wasserstein Generative Adversarial Network (WGAN) on
the existing human skeletons present in the scene. Next, the predicted skeleton
is refined through a shallow linear network to achieve higher structural
accuracy in the generated image. Finally, the target image is generated from
the refined skeleton using another generative network conditioned on a given
image of the target person. In our experiments, we achieve high-resolution
photo-realistic generation results while preserving the general context of the
scene. We conclude our paper with multiple qualitative and quantitative
benchmarks on the results.
Related papers
- Stellar: Systematic Evaluation of Human-Centric Personalized
Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context.
We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available.
We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z) - HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion [114.15397904945185]
We propose a unified framework, HyperHuman, that generates in-the-wild human images of high realism and diverse layouts.
Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network.
Our framework yields the state-of-the-art performance, generating hyper-realistic human images under diverse scenarios.
arXiv Detail & Related papers (2023-10-12T17:59:34Z) - Global Context-Aware Person Image Generation [24.317541784957285]
We propose a data-driven approach for context-aware person image generation.
In our method, the position, scale, and appearance of the generated person are semantically conditioned on the existing persons in the scene.
arXiv Detail & Related papers (2023-02-28T16:34:55Z) - NeuralReshaper: Single-image Human-body Retouching with Deep Neural
Networks [50.40798258968408]
We present NeuralReshaper, a novel method for semantic reshaping of human bodies in single images using deep generative networks.
Our approach follows a fit-then-reshape pipeline, which first fits a parametric 3D human model to a source human image.
To deal with the lack-of-data problem that no paired data exist, we introduce a novel self-supervised strategy to train our network.
arXiv Detail & Related papers (2022-03-20T09:02:13Z) - Realistic Full-Body Anonymization with Surface-Guided GANs [7.37907896341367]
We propose a new anonymization method that generates realistic humans for in-the-wild images.
A key part of our design is to guide adversarial nets by dense pixel-to-surface correspondences between an image and a canonical 3D surface.
We demonstrate that surface guidance significantly improves image quality and diversity of samples, yielding a highly practical generator.
arXiv Detail & Related papers (2022-01-06T18:57:59Z) - Learned Spatial Representations for Few-shot Talking-Head Synthesis [68.3787368024951]
We propose a novel approach for few-shot talking-head synthesis.
We show that this disentangled representation leads to a significant improvement over previous methods.
arXiv Detail & Related papers (2021-04-29T17:59:42Z) - Subject-independent Human Pose Image Construction with Commodity Wi-Fi [24.099783319415913]
This paper focuses on solving the subject-generalization problem in human pose image construction.
We design a Domain-Independent Neural Network (DINN) to extract subject-independent features and convert them into fine-grained human pose images.
We build a prototype system and experimental results demonstrate that our system can construct fine-grained human pose images of new subjects with commodity Wi-Fi.
arXiv Detail & Related papers (2020-12-22T03:15:56Z) - Generating Person Images with Appearance-aware Pose Stylizer [66.44220388377596]
We present a novel end-to-end framework to generate realistic person images based on given person poses and appearances.
The core of our framework is a novel generator called Appearance-aware Pose Stylizer (APS) which generates human images by coupling the target pose with the conditioned person appearance progressively.
arXiv Detail & Related papers (2020-07-17T15:58:05Z) - Wish You Were Here: Context-Aware Human Generation [100.51309746913512]
We present a novel method for inserting objects, specifically humans, into existing images.
Our method involves threeworks: the first generates the semantic map of the new person, given the pose of the other persons in the scene.
The second network renders the pixels of the novel person and its blending mask, based on specifications in the form of multiple appearance components.
A third network refines the generated face in order to match those of the target person.
arXiv Detail & Related papers (2020-05-21T14:09:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.