FashionTex: Controllable Virtual Try-on with Text and Texture
- URL: http://arxiv.org/abs/2305.04451v1
- Date: Mon, 8 May 2023 04:10:36 GMT
- Title: FashionTex: Controllable Virtual Try-on with Text and Texture
- Authors: Anran Lin, Nanxuan Zhao, Shuliang Ning, Yuda Qiu, Baoyuan Wang,
Xiaoguang Han
- Abstract summary: We propose a multi-modal interactive setting by combining the advantages of both text and texture for multi-level fashion manipulation.
FashionTex framework can semantically control cloth types and local texture patterns without annotated pairwise training data.
- Score: 29.7855591607239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Virtual try-on attracts increasing research attention as a promising way for
enhancing the user experience for online cloth shopping. Though existing
methods can generate impressive results, users need to provide a well-designed
reference image containing the target fashion clothes that often do not exist.
To support user-friendly fashion customization in full-body portraits, we
propose a multi-modal interactive setting by combining the advantages of both
text and texture for multi-level fashion manipulation. With the carefully
designed fashion editing module and loss functions, FashionTex framework can
semantically control cloth types and local texture patterns without annotated
pairwise training data. We further introduce an ID recovery module to maintain
the identity of input portrait. Extensive experiments have demonstrated the
effectiveness of our proposed pipeline.
Related papers
- AnyDesign: Versatile Area Fashion Editing via Mask-Free Diffusion [25.61572702219732]
Fashion image editing aims to modify a person's appearance based on a given instruction.
Current methods require auxiliary tools like segmenters and keypoint extractors, lacking a flexible and unified framework.
We propose AnyDesign, a diffusion-based method that enables mask-free editing on versatile areas.
arXiv Detail & Related papers (2024-08-21T12:04:32Z) - IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions.
IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE.
We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z) - MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation [70.83668869857665]
MMTryon is a multi-modal multi-reference VIrtual Try-ON framework.
It can generate high-quality compositional try-on results by taking a text instruction and multiple garment images as inputs.
arXiv Detail & Related papers (2024-05-01T11:04:22Z) - Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment [8.335876030647118]
Wear-Any-Way is a customizable solution for virtual try-on.
We first construct a strong pipeline for standard virtual try-on, supporting single/multiple garment try-on and model-to-model settings.
We propose sparse correspondence alignment which involves point-based control to guide the generation for specific locations.
arXiv Detail & Related papers (2024-03-19T17:59:52Z) - PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns [25.209863457090506]
We propose a novel virtual try-on from unconstrained designs (ucVTON) task to enable synthesis of personalized composite clothing on input human images.
Unlike prior arts constrained by specific input types, our method allows flexible specification of style (text or image) and texture (full garment, cropped sections, or texture patches) conditions.
arXiv Detail & Related papers (2023-12-07T18:53:18Z) - Single Stage Warped Cloth Learning and Semantic-Contextual Attention Feature Fusion for Virtual TryOn [5.790630195329777]
Image-based virtual try-on aims to fit an in-shop garment onto a clothed person image.
Garment warping, which aligns the target garment with the corresponding body parts in the person image, is a crucial step in achieving this goal.
We propose a novel single-stage framework that implicitly learns the same without explicit multi-stage learning.
arXiv Detail & Related papers (2023-10-08T06:05:01Z) - FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified
Retrieval and Captioning [66.38951790650887]
Multimodal tasks in the fashion domain have significant potential for e-commerce.
We propose a novel fashion-specific pre-training framework based on weakly-supervised triplets constructed from fashion image-text pairs.
We show the triplet-based tasks are an effective addition to standard multimodal pre-training tasks.
arXiv Detail & Related papers (2022-10-26T21:01:19Z) - TediGAN: Text-Guided Diverse Face Image Generation and Manipulation [52.83401421019309]
TediGAN is a framework for multi-modal image generation and manipulation with textual descriptions.
StyleGAN inversion module maps real images to the latent space of a well-trained StyleGAN.
visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space.
instance-level optimization is for identity preservation in manipulation.
arXiv Detail & Related papers (2020-12-06T16:20:19Z) - Region-adaptive Texture Enhancement for Detailed Person Image Synthesis [86.69934638569815]
RATE-Net is a novel framework for synthesizing person images with sharp texture details.
The proposed framework leverages an additional texture enhancing module to extract appearance information from the source image.
Experiments conducted on DeepFashion benchmark dataset have demonstrated the superiority of our framework compared with existing networks.
arXiv Detail & Related papers (2020-05-26T02:33:21Z) - Personalized Fashion Recommendation from Personal Social Media Data: An
Item-to-Set Metric Learning Approach [71.63618051547144]
We study the problem of personalized fashion recommendation from social media data.
We present an item-to-set metric learning framework that learns to compute the similarity between a set of historical fashion items of a user to a new fashion item.
To validate the effectiveness of our approach, we collect a real-world social media dataset.
arXiv Detail & Related papers (2020-05-25T23:24:24Z) - FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal
Retrieval [31.822218310945036]
FashionBERT learns high level representations of texts and images.
FashionBERT achieves significant improvements in performances than the baseline and state-of-the-art approaches.
arXiv Detail & Related papers (2020-05-20T00:41:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.