AnyDesign: Versatile Area Fashion Editing via Mask-Free Diffusion
- URL: http://arxiv.org/abs/2408.11553v4
- Date: Thu, 17 Oct 2024 08:58:47 GMT
- Title: AnyDesign: Versatile Area Fashion Editing via Mask-Free Diffusion
- Authors: Yunfang Niu, Lingxiang Wu, Dong Yi, Jie Peng, Ning Jiang, Haiying Wu, Jinqiao Wang,
- Abstract summary: Fashion image editing aims to modify a person's appearance based on a given instruction.
Current methods require auxiliary tools like segmenters and keypoint extractors, lacking a flexible and unified framework.
We propose AnyDesign, a diffusion-based method that enables mask-free editing on versatile areas.
- Score: 25.61572702219732
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Fashion image editing aims to modify a person's appearance based on a given instruction. Existing methods require auxiliary tools like segmenters and keypoint extractors, lacking a flexible and unified framework. Moreover, these methods are limited in the variety of clothing types they can handle, as most datasets focus on people in clean backgrounds and only include generic garments such as tops, pants, and dresses. These limitations restrict their applicability in real-world scenarios. In this paper, we first extend an existing dataset for human generation to include a wider range of apparel and more complex backgrounds. This extended dataset features people wearing diverse items such as tops, pants, dresses, skirts, headwear, scarves, shoes, socks, and bags. Additionally, we propose AnyDesign, a diffusion-based method that enables mask-free editing on versatile areas. Users can simply input a human image along with a corresponding prompt in either text or image format. Our approach incorporates Fashion DiT, equipped with a Fashion-Guidance Attention (FGA) module designed to fuse explicit apparel types and CLIP-encoded apparel features. Both Qualitative and quantitative experiments demonstrate that our method delivers high-quality fashion editing and outperforms contemporary text-guided fashion editing methods.
Related papers
- MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation [70.83668869857665]
MMTryon is a multi-modal multi-reference VIrtual Try-ON framework.
It can generate high-quality compositional try-on results by taking a text instruction and multiple garment images as inputs.
arXiv Detail & Related papers (2024-05-01T11:04:22Z) - Lost Your Style? Navigating with Semantic-Level Approach for
Text-to-Outfit Retrieval [2.07180164747172]
We introduce a groundbreaking approach to fashion recommendations: text-to-outfit retrieval task that generates a complete outfit set based solely on textual descriptions.
Our model is devised at three semantic levels-item, style, and outfit-where each level progressively aggregates data to form a coherent outfit recommendation.
Using the Maryland Polyvore and Polyvore Outfit datasets, our approach significantly outperformed state-of-the-art models in text-video retrieval tasks.
arXiv Detail & Related papers (2023-11-03T07:23:21Z) - Fashion Matrix: Editing Photos by Just Talking [66.83502497764698]
We develop a hierarchical AI system called Fashion Matrix dedicated to editing photos by just talking.
Fashion Matrix employs Large Language Models (LLMs) as its foundational support and engages in iterative interactions with users.
Visual Foundation Models are leveraged to generate edited images from text prompts and masks, thereby facilitating the automation of fashion editing processes.
arXiv Detail & Related papers (2023-07-25T04:06:25Z) - FashionTex: Controllable Virtual Try-on with Text and Texture [29.7855591607239]
We propose a multi-modal interactive setting by combining the advantages of both text and texture for multi-level fashion manipulation.
FashionTex framework can semantically control cloth types and local texture patterns without annotated pairwise training data.
arXiv Detail & Related papers (2023-05-08T04:10:36Z) - Multimodal Garment Designer: Human-Centric Latent Diffusion Models for
Fashion Image Editing [40.70752781891058]
We propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images.
We tackle this problem by proposing a new architecture based on latent diffusion models.
Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets.
arXiv Detail & Related papers (2023-04-04T18:03:04Z) - FICE: Text-Conditioned Fashion Image Editing With Guided GAN Inversion [16.583537785874604]
We propose a novel text-conditioned editing model, called FICE, capable of handling a wide variety of diverse text descriptions.
FICE generates highly realistic fashion images and leads to stronger editing performance than existing competing approaches.
arXiv Detail & Related papers (2023-01-05T15:33:23Z) - FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified
Retrieval and Captioning [66.38951790650887]
Multimodal tasks in the fashion domain have significant potential for e-commerce.
We propose a novel fashion-specific pre-training framework based on weakly-supervised triplets constructed from fashion image-text pairs.
We show the triplet-based tasks are an effective addition to standard multimodal pre-training tasks.
arXiv Detail & Related papers (2022-10-26T21:01:19Z) - Arbitrary Virtual Try-On Network: Characteristics Preservation and
Trade-off between Body and Clothing [85.74977256940855]
We propose an Arbitrary Virtual Try-On Network (AVTON) for all-type clothes.
AVTON can synthesize realistic try-on images by preserving and trading off characteristics of the target clothes and the reference person.
Our approach can achieve better performance compared with the state-of-the-art virtual try-on methods.
arXiv Detail & Related papers (2021-11-24T08:59:56Z) - SMPLicit: Topology-aware Generative Model for Clothed People [65.84665248796615]
We introduce SMPLicit, a novel generative model to jointly represent body pose, shape and clothing geometry.
In the experimental section, we demonstrate SMPLicit can be readily used for fitting 3D scans and for 3D reconstruction in images of dressed people.
arXiv Detail & Related papers (2021-03-11T18:57:03Z) - Learning Diverse Fashion Collocation by Neural Graph Filtering [78.9188246136867]
We propose a novel fashion collocation framework, Neural Graph Filtering, that models a flexible set of fashion items via a graph neural network.
By applying symmetric operations on the edge vectors, this framework allows varying numbers of inputs/outputs and is invariant to their ordering.
We evaluate the proposed approach on three popular benchmarks, the Polyvore dataset, the Polyvore-D dataset, and our reorganized Amazon Fashion dataset.
arXiv Detail & Related papers (2020-03-11T16:17:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.