UMFuse: Unified Multi View Fusion for Human Editing applications
- URL: http://arxiv.org/abs/2211.10157v4
- Date: Tue, 28 Mar 2023 11:02:19 GMT
- Title: UMFuse: Unified Multi View Fusion for Human Editing applications
- Authors: Rishabh Jain, Mayur Hemani, Duygu Ceylan, Krishna Kumar Singh, Jingwan
Lu, Mausoom Sarkar, Balaji Krishnamurthy
- Abstract summary: We design a multi-view fusion network that takes the pose key points and texture from multiple source images.
We show the application of our network on two newly proposed tasks - Multi-view human reposing and Mix&Match Human Image generation.
- Score: 36.94334399493266
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Numerous pose-guided human editing methods have been explored by the vision
community due to their extensive practical applications. However, most of these
methods still use an image-to-image formulation in which a single image is
given as input to produce an edited image as output. This objective becomes
ill-defined in cases when the target pose differs significantly from the input
pose. Existing methods then resort to in-painting or style transfer to handle
occlusions and preserve content. In this paper, we explore the utilization of
multiple views to minimize the issue of missing information and generate an
accurate representation of the underlying human model. To fuse knowledge from
multiple viewpoints, we design a multi-view fusion network that takes the pose
key points and texture from multiple source images and generates an explainable
per-pixel appearance retrieval map. Thereafter, the encodings from a separate
network (trained on a single-view human reposing task) are merged in the latent
space. This enables us to generate accurate, precise, and visually coherent
images for different editing tasks. We show the application of our network on
two newly proposed tasks - Multi-view human reposing and Mix&Match Human Image
generation. Additionally, we study the limitations of single-view editing and
scenarios in which multi-view provides a better alternative.
Related papers
- A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models [117.77807994397784]
Image editing aims to edit the given synthetic or real image to meet the specific requirements from users.
Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models.
T2I-based image editing methods significantly enhance editing performance and offer a user-friendly interface for modifying content guided by multimodal inputs.
arXiv Detail & Related papers (2024-06-20T17:58:52Z) - Zero-shot Image Editing with Reference Imitation [50.75310094611476]
We present a new form of editing, termed imitative editing, to help users exercise their creativity more conveniently.
We propose a generative training framework, dubbed MimicBrush, which randomly selects two frames from a video clip, masks some regions of one frame, and learns to recover the masked regions using the information from the other frame.
We experimentally show the effectiveness of our method under various test cases as well as its superiority over existing alternatives.
arXiv Detail & Related papers (2024-06-11T17:59:51Z) - From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation [19.096741614175524]
Parts2Whole is a novel framework designed for generating customized portraits from multiple reference images.
We first develop a semantic-aware appearance encoder to retain details of different human parts.
Second, our framework supports multi-image conditioned generation through a shared self-attention mechanism.
arXiv Detail & Related papers (2024-04-23T17:56:08Z) - Continuous Layout Editing of Single Images with Diffusion Models [24.581184791106562]
We propose the first framework for layout editing of a single image while preserving its visual properties.
Our approach is achieved through two key modules.
Our code will be freely available for public use upon acceptance.
arXiv Detail & Related papers (2023-06-22T17:51:05Z) - Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition.
We propose augmenting the input image with masks that indicate the presence of target concepts.
We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z) - Pose Guided Multi-person Image Generation From Text [15.15576618501609]
Existing methods struggle to create high fidelity full-body images, especially multiple people.
We propose a pose-guided text-to-image model, using pose as an additional input constraint.
We show results on the Deepfashion dataset and create a new multi-person Deepfashion dataset to demonstrate the multi-capabilities of our approach.
arXiv Detail & Related papers (2022-03-09T17:38:03Z) - Single-View View Synthesis with Multiplane Images [64.46556656209769]
We apply deep learning to generate multiplane images given two or more input images at known viewpoints.
Our method learns to predict a multiplane image directly from a single image input.
It additionally generates reasonable depth maps and fills in content behind the edges of foreground objects in background layers.
arXiv Detail & Related papers (2020-04-23T17:59:19Z) - Unifying Specialist Image Embedding into Universal Image Embedding [84.0039266370785]
It is desirable to have a universal deep embedding model applicable to various domains of images.
We propose to distill the knowledge in multiple specialists into a universal embedding to solve this problem.
arXiv Detail & Related papers (2020-03-08T02:51:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.