Single Stage Virtual Try-on via Deformable Attention Flows
- URL: http://arxiv.org/abs/2207.09161v1
- Date: Tue, 19 Jul 2022 10:01:31 GMT
- Title: Single Stage Virtual Try-on via Deformable Attention Flows
- Authors: Shuai Bai, Huiling Zhou, Zhikang Li, Chang Zhou, Hongxia Yang
- Abstract summary: Virtual try-on aims to generate a photo-realistic fitting result given an in-shop garment and a reference person image.
We develop a novel Deformable Attention Flow (DAFlow) which applies the deformable attention scheme to multi-flow estimation.
Our proposed method achieves state-of-the-art performance both qualitatively and quantitatively.
- Score: 51.70606454288168
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Virtual try-on aims to generate a photo-realistic fitting result given an
in-shop garment and a reference person image. Existing methods usually build up
multi-stage frameworks to deal with clothes warping and body blending
respectively, or rely heavily on intermediate parser-based labels which may be
noisy or even inaccurate. To solve the above challenges, we propose a
single-stage try-on framework by developing a novel Deformable Attention Flow
(DAFlow), which applies the deformable attention scheme to multi-flow
estimation. With pose keypoints as the guidance only, the self- and
cross-deformable attention flows are estimated for the reference person and the
garment images, respectively. By sampling multiple flow fields, the
feature-level and pixel-level information from different semantic areas are
simultaneously extracted and merged through the attention mechanism. It enables
clothes warping and body synthesizing at the same time which leads to
photo-realistic results in an end-to-end manner. Extensive experiments on two
try-on datasets demonstrate that our proposed method achieves state-of-the-art
performance both qualitatively and quantitatively. Furthermore, additional
experiments on the other two image editing tasks illustrate the versatility of
our method for multi-view synthesis and image animation.
Related papers
- Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment.
We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images.
We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z) - Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images.
We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process.
Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z) - Multi-scale Target-Aware Framework for Constrained Image Splicing
Detection and Localization [11.803255600587308]
We propose a multi-scale target-aware framework to couple feature extraction and correlation matching in a unified pipeline.
Our approach can effectively promote the collaborative learning of related patches, and perform mutual promotion of feature learning and correlation matching.
Our experiments demonstrate that our model, which uses a unified pipeline, outperforms state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2023-08-18T07:38:30Z) - Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and
Beyond [50.556961575275345]
We build an image fusion module to fuse complementary characteristics and cascade dual task-related modules.
We develop an efficient first-order approximation to compute corresponding gradients and present dynamic weighted aggregation to balance the gradients for fusion learning.
arXiv Detail & Related papers (2023-05-11T10:55:34Z) - Learning to search for and detect objects in foveal images using deep
learning [3.655021726150368]
This study employs a fixation prediction model that emulates human objective-guided attention of searching for a given class in an image.
The foveated pictures at each fixation point are then classified to determine whether the target is present or absent in the scene.
We present a novel dual task model capable of performing fixation prediction and detection simultaneously, allowing knowledge transfer between the two tasks.
arXiv Detail & Related papers (2023-04-12T09:50:25Z) - ZFlow: Gated Appearance Flow-based Virtual Try-on with 3D Priors [13.977100716044104]
Image-based virtual try-on involves synthesizing convincing images of a model wearing a particular garment.
Recent methods involve a two stage process: i.) warping of the garment to align with the model ii.
The lack of geometric information about the model or the garment often results in improper rendering of granular details.
We propose ZFlow, an end-to-end framework, which seeks to alleviate these concerns.
arXiv Detail & Related papers (2021-09-14T22:41:14Z) - TSIT: A Simple and Versatile Framework for Image-to-Image Translation [103.92203013154403]
We introduce a simple and versatile framework for image-to-image translation.
We provide a carefully designed two-stream generative model with newly proposed feature transformations.
This allows multi-scale semantic structure information and style representation to be effectively captured and fused by the network.
A systematic study compares the proposed method with several state-of-the-art task-specific baselines, verifying its effectiveness in both perceptual quality and quantitative evaluations.
arXiv Detail & Related papers (2020-07-23T15:34:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.