VGFlow: Visibility guided Flow Network for Human Reposing
- URL: http://arxiv.org/abs/2211.08540v4
- Date: Tue, 28 Mar 2023 10:57:05 GMT
- Title: VGFlow: Visibility guided Flow Network for Human Reposing
- Authors: Rishabh Jain, Krishna Kumar Singh, Mayur Hemani, Jingwan Lu, Mausoom
Sarkar, Duygu Ceylan, Balaji Krishnamurthy
- Abstract summary: We propose VGFlow to generate perceptually accurate images of humans.
Our model uses a visibility-guided flow module to disentangle the flow into visible and invisible parts.
VGFlow achieves state-of-the-art results as observed on different image quality metrics.
- Score: 36.94334399493267
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The task of human reposing involves generating a realistic image of a person
standing in an arbitrary conceivable pose. There are multiple difficulties in
generating perceptually accurate images, and existing methods suffer from
limitations in preserving texture, maintaining pattern coherence, respecting
cloth boundaries, handling occlusions, manipulating skin generation, etc. These
difficulties are further exacerbated by the fact that the possible space of
pose orientation for humans is large and variable, the nature of clothing items
is highly non-rigid, and the diversity in body shape differs largely among the
population. To alleviate these difficulties and synthesize perceptually
accurate images, we propose VGFlow. Our model uses a visibility-guided flow
module to disentangle the flow into visible and invisible parts of the target
for simultaneous texture preservation and style manipulation. Furthermore, to
tackle distinct body shapes and avoid network artifacts, we also incorporate a
self-supervised patch-wise "realness" loss to improve the output. VGFlow
achieves state-of-the-art results as observed qualitatively and quantitatively
on different image quality metrics (SSIM, LPIPS, FID).
Related papers
- DRDM: A Disentangled Representations Diffusion Model for Synthesizing Realistic Person Images [9.768951663960257]
We propose a Disentangled Representations Diffusion Model (DRDM) to generate photo-realistic images from source portraits.
First, a pose encoder is responsible for encoding pose features into a high-dimensional space to guide the generation of person images.
Second, a body-part subspace decoupling block (BSDB) disentangles features from the different body parts of a source figure and feeds them to the various layers of the noise prediction block.
arXiv Detail & Related papers (2024-12-25T06:36:24Z) - Learning Flow Fields in Attention for Controllable Person Image Generation [59.10843756343987]
Controllable person image generation aims to generate a person image conditioned on reference images.
We propose learning flow fields in attention (Leffa), which explicitly guides the target query to attend to the correct reference key.
Leffa achieves state-of-the-art performance in controlling appearance (virtual try-on) and pose (pose transfer), significantly reducing fine-grained detail distortion.
arXiv Detail & Related papers (2024-12-11T15:51:14Z) - One-shot Human Motion Transfer via Occlusion-Robust Flow Prediction and Neural Texturing [21.613055849276385]
We propose a unified framework that combines multi-scale feature warping and neural texture mapping to recover better 2D appearance and 2.5D geometry.
Our model takes advantage of multiple modalities by jointly training and fusing them, which allows it to robust neural texture features that cope with geometric errors.
arXiv Detail & Related papers (2024-12-09T03:14:40Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - ZFlow: Gated Appearance Flow-based Virtual Try-on with 3D Priors [13.977100716044104]
Image-based virtual try-on involves synthesizing convincing images of a model wearing a particular garment.
Recent methods involve a two stage process: i.) warping of the garment to align with the model ii.
The lack of geometric information about the model or the garment often results in improper rendering of granular details.
We propose ZFlow, an end-to-end framework, which seeks to alleviate these concerns.
arXiv Detail & Related papers (2021-09-14T22:41:14Z) - Structure-aware Person Image Generation with Pose Decomposition and
Semantic Correlation [29.727033198797518]
We propose a structure-aware flow based method for high-quality person image generation.
We decompose the human body into different semantic parts and apply different networks to predict the flow fields for these parts separately.
Our method can generate high-quality results under large pose discrepancy and outperforms state-of-the-art methods in both qualitative and quantitative comparisons.
arXiv Detail & Related papers (2021-02-05T03:07:57Z) - Neural Re-Rendering of Humans from a Single Image [80.53438609047896]
We propose a new method for neural re-rendering of a human under a novel user-defined pose and viewpoint.
Our algorithm represents body pose and shape as a parametric mesh which can be reconstructed from a single image.
arXiv Detail & Related papers (2021-01-11T18:53:47Z) - Encoding Robustness to Image Style via Adversarial Feature Perturbations [72.81911076841408]
We adapt adversarial training by directly perturbing feature statistics, rather than image pixels, to produce robust models.
Our proposed method, Adversarial Batch Normalization (AdvBN), is a single network layer that generates worst-case feature perturbations during training.
arXiv Detail & Related papers (2020-09-18T17:52:34Z) - Adversarial Semantic Data Augmentation for Human Pose Estimation [96.75411357541438]
We propose Semantic Data Augmentation (SDA), a method that augments images by pasting segmented body parts with various semantic granularity.
We also propose Adversarial Semantic Data Augmentation (ASDA), which exploits a generative network to dynamiclly predict tailored pasting configuration.
State-of-the-art results are achieved on challenging benchmarks.
arXiv Detail & Related papers (2020-08-03T07:56:04Z) - Enhanced Residual Networks for Context-based Image Outpainting [0.0]
Deep models struggle to understand context and extrapolation through retained information.
Current models use generative adversarial networks to generate results which lack localized image feature consistency and appear fake.
We propose two methods to improve this issue: the use of a local and global discriminator, and the addition of residual blocks within the encoding section of the network.
arXiv Detail & Related papers (2020-05-14T05:14:26Z) - Self-Supervised Linear Motion Deblurring [112.75317069916579]
Deep convolutional neural networks are state-of-the-art for image deblurring.
We present a differentiable reblur model for self-supervised motion deblurring.
Our experiments demonstrate that self-supervised single image deblurring is really feasible.
arXiv Detail & Related papers (2020-02-10T20:15:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.