Related papers: Single Stage Virtual Try-on via Deformable Attention Flows

Single Stage Virtual Try-on via Deformable Attention Flows

URL: http://arxiv.org/abs/2207.09161v1
Date: Tue, 19 Jul 2022 10:01:31 GMT
Title: Single Stage Virtual Try-on via Deformable Attention Flows
Authors: Shuai Bai, Huiling Zhou, Zhikang Li, Chang Zhou, Hongxia Yang
Abstract summary: Virtual try-on aims to generate a photo-realistic fitting result given an in-shop garment and a reference person image. We develop a novel Deformable Attention Flow (DAFlow) which applies the deformable attention scheme to multi-flow estimation. Our proposed method achieves state-of-the-art performance both qualitatively and quantitatively.
Score: 51.70606454288168
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Virtual try-on aims to generate a photo-realistic fitting result given an in-shop garment and a reference person image. Existing methods usually build up multi-stage frameworks to deal with clothes warping and body blending respectively, or rely heavily on intermediate parser-based labels which may be noisy or even inaccurate. To solve the above challenges, we propose a single-stage try-on framework by developing a novel Deformable Attention Flow (DAFlow), which applies the deformable attention scheme to multi-flow estimation. With pose keypoints as the guidance only, the self- and cross-deformable attention flows are estimated for the reference person and the garment images, respectively. By sampling multiple flow fields, the feature-level and pixel-level information from different semantic areas are simultaneously extracted and merged through the attention mechanism. It enables clothes warping and body synthesizing at the same time which leads to photo-realistic results in an end-to-end manner. Extensive experiments on two try-on datasets demonstrate that our proposed method achieves state-of-the-art performance both qualitatively and quantitatively. Furthermore, additional experiments on the other two image editing tasks illustrate the versatility of our method for multi-view synthesis and image animation.

Related papers

One Diffusion to Generate Them All [54.82732533013014]
OneDiffusion is a versatile, large-scale diffusion model that supports bidirectional image synthesis and understanding. It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps. OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs.
arXiv Detail & Related papers (2024-11-25T12:11:05Z)
Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment. We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z)
Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries. We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework. We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z)
Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images. We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process. Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z)
Multi-scale Target-Aware Framework for Constrained Image Splicing Detection and Localization [11.803255600587308]
We propose a multi-scale target-aware framework to couple feature extraction and correlation matching in a unified pipeline. Our approach can effectively promote the collaborative learning of related patches, and perform mutual promotion of feature learning and correlation matching. Our experiments demonstrate that our model, which uses a unified pipeline, outperforms state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2023-08-18T07:38:30Z)
Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and Beyond [50.556961575275345]
We build an image fusion module to fuse complementary characteristics and cascade dual task-related modules. We develop an efficient first-order approximation to compute corresponding gradients and present dynamic weighted aggregation to balance the gradients for fusion learning.
arXiv Detail & Related papers (2023-05-11T10:55:34Z)
Learning to search for and detect objects in foveal images using deep learning [3.655021726150368]
This study employs a fixation prediction model that emulates human objective-guided attention of searching for a given class in an image. The foveated pictures at each fixation point are then classified to determine whether the target is present or absent in the scene. We present a novel dual task model capable of performing fixation prediction and detection simultaneously, allowing knowledge transfer between the two tasks.
arXiv Detail & Related papers (2023-04-12T09:50:25Z)
ZFlow: Gated Appearance Flow-based Virtual Try-on with 3D Priors [13.977100716044104]
Image-based virtual try-on involves synthesizing convincing images of a model wearing a particular garment. Recent methods involve a two stage process: i.) warping of the garment to align with the model ii. The lack of geometric information about the model or the garment often results in improper rendering of granular details. We propose ZFlow, an end-to-end framework, which seeks to alleviate these concerns.
arXiv Detail & Related papers (2021-09-14T22:41:14Z)
TSIT: A Simple and Versatile Framework for Image-to-Image Translation [103.92203013154403]
We introduce a simple and versatile framework for image-to-image translation. We provide a carefully designed two-stream generative model with newly proposed feature transformations. This allows multi-scale semantic structure information and style representation to be effectively captured and fused by the network. A systematic study compares the proposed method with several state-of-the-art task-specific baselines, verifying its effectiveness in both perceptual quality and quantitative evaluations.
arXiv Detail & Related papers (2020-07-23T15:34:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.