Environment-Specific People
- URL: http://arxiv.org/abs/2312.14579v1
- Date: Fri, 22 Dec 2023 10:15:15 GMT
- Title: Environment-Specific People
- Authors: Mirela Ostrek, Soubhik Sanyal, Carol O'Sullivan, Michael J. Black,
Justus Thies
- Abstract summary: We present ESP, a novel method for context-aware full-body generation.
ESP is conditioned on a 2D pose and contextual cues that are extracted from the environment photograph.
We show that ESP outperforms state-of-the-art on the task of contextual full-body generation.
- Score: 59.14959529735115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite significant progress in generative image synthesis and full-body
generation in particular, state-of-the-art methods are either
context-independent, overly reliant to text prompts, or bound to the curated
training datasets, such as fashion images with monotonous backgrounds. Here,
our goal is to generate people in clothing that is semantically appropriate for
a given scene. To this end, we present ESP, a novel method for context-aware
full-body generation, that enables photo-realistic inpainting of people into
existing "in-the-wild" photographs. ESP is conditioned on a 2D pose and
contextual cues that are extracted from the environment photograph and
integrated into the generation process. Our models are trained on a dataset
containing a set of in-the-wild photographs of people covering a wide range of
different environments. The method is analyzed quantitatively and
qualitatively, and we show that ESP outperforms state-of-the-art on the task of
contextual full-body generation.
Related papers
- Instruct-Imagen: Image Generation with Multi-modal Instruction [90.04481955523514]
instruct-imagen is a model that tackles heterogeneous image generation tasks and generalizes across unseen tasks.
We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation intents with precision.
Human evaluation on various image generation datasets reveals that instruct-imagen matches or surpasses prior task-specific models in-domain.
arXiv Detail & Related papers (2024-01-03T19:31:58Z) - Decoupled Textual Embeddings for Customized Image Generation [62.98933630971543]
Customized text-to-image generation aims to learn user-specified concepts with a few images.
Existing methods usually suffer from overfitting issues and entangle the subject-unrelated information with the learned concept.
We propose the DETEX, a novel approach that learns the disentangled concept embedding for flexible customized text-to-image generation.
arXiv Detail & Related papers (2023-12-19T03:32:10Z) - Stellar: Systematic Evaluation of Human-Centric Personalized
Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context.
We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available.
We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z) - FaceStudio: Put Your Face Everywhere in Seconds [23.381791316305332]
Identity-preserving image synthesis seeks to maintain a subject's identity while adding a personalized, stylistic touch.
Traditional methods, such as Textual Inversion and DreamBooth, have made strides in custom image creation.
Our research introduces a novel approach to identity-preserving synthesis, with a particular focus on human images.
arXiv Detail & Related papers (2023-12-05T11:02:45Z) - Highly Personalized Text Embedding for Image Manipulation by Stable
Diffusion [34.662798793560995]
We present a simple yet highly effective approach to personalization using highly personalized (PerHi) text embedding.
Our method does not require model fine-tuning or identifiers, yet still enables manipulation of background, texture, and motion with just a single image and target text.
arXiv Detail & Related papers (2023-03-15T17:07:45Z) - Global Context-Aware Person Image Generation [24.317541784957285]
We propose a data-driven approach for context-aware person image generation.
In our method, the position, scale, and appearance of the generated person are semantically conditioned on the existing persons in the scene.
arXiv Detail & Related papers (2023-02-28T16:34:55Z) - HumanDiffusion: a Coarse-to-Fine Alignment Diffusion Framework for
Controllable Text-Driven Person Image Generation [73.3790833537313]
Controllable person image generation promotes a wide range of applications such as digital human interaction and virtual try-on.
We propose HumanDiffusion, a coarse-to-fine alignment diffusion framework, for text-driven person image generation.
arXiv Detail & Related papers (2022-11-11T14:30:34Z) - Language Does More Than Describe: On The Lack Of Figurative Speech in
Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts.
These models have been trained using text data collected from content-based labelling protocols.
We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z) - Scene Aware Person Image Generation through Global Contextual
Conditioning [24.317541784957285]
We propose a novel pipeline to generate and insert contextually relevant person images into an existing scene.
More specifically, we aim to insert a person such that the location, pose, and scale of the person being inserted blends in with the existing persons in the scene.
arXiv Detail & Related papers (2022-06-06T16:18:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.