TailorGAN: Making User-Defined Fashion Designs
- URL: http://arxiv.org/abs/2001.06427v2
- Date: Mon, 20 Jan 2020 03:33:33 GMT
- Title: TailorGAN: Making User-Defined Fashion Designs
- Authors: Lele Chen, Justin Tian, Guo Li, Cheng-Haw Wu, Erh-Kan King, Kuan-Ting
Chen, Shao-Hang Hsieh, Chenliang Xu
- Abstract summary: We propose a novel self-supervised model to synthesize garment images with disentangled attributes without paired data.
Our method consists of a reconstruction learning step and an adversarial learning step.
Experiments on this dataset and real-world samples demonstrate that our method can synthesize much better results than the state-of-the-art methods.
- Score: 28.805686791183618
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Attribute editing has become an important and emerging topic of computer
vision. In this paper, we consider a task: given a reference garment image A
and another image B with target attribute (collar/sleeve), generate a
photo-realistic image which combines the texture from reference A and the new
attribute from reference B. The highly convoluted attributes and the lack of
paired data are the main challenges to the task. To overcome those limitations,
we propose a novel self-supervised model to synthesize garment images with
disentangled attributes (e.g., collar and sleeves) without paired data. Our
method consists of a reconstruction learning step and an adversarial learning
step. The model learns texture and location information through reconstruction
learning. And, the model's capability is generalized to achieve
single-attribute manipulation by adversarial learning. Meanwhile, we compose a
new dataset, named GarmentSet, with annotation of landmarks of collars and
sleeves on clean garment images. Extensive experiments on this dataset and
real-world samples demonstrate that our method can synthesize much better
results than the state-of-the-art methods in both quantitative and qualitative
comparisons.
Related papers
- Learning Action and Reasoning-Centric Image Editing from Videos and Simulations [45.637947364341436]
AURORA dataset is a collection of high-quality training data, human-annotated and curated from videos and simulation engines.
We evaluate an AURORA-finetuned model on a new expert-curated benchmark covering 8 diverse editing tasks.
Our model significantly outperforms previous editing models as judged by human raters.
arXiv Detail & Related papers (2024-07-03T19:36:33Z) - Evaluating Data Attribution for Text-to-Image Models [62.844382063780365]
We evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style.
Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction.
By taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
arXiv Detail & Related papers (2023-06-15T17:59:51Z) - Learning Transferable Pedestrian Representation from Multimodal
Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information.
We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations.
We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z) - TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose
Estimation [55.94900327396771]
We introduce neural texture learning for 6D object pose estimation from synthetic data.
We learn to predict realistic texture of objects from real image collections.
We learn pose estimation from pixel-perfect synthetic data.
arXiv Detail & Related papers (2022-12-25T13:36:32Z) - ClipCrop: Conditioned Cropping Driven by Vision-Language Model [90.95403416150724]
We take advantage of vision-language models as a foundation for creating robust and user-intentional cropping algorithms.
We develop a method to perform cropping with a text or image query that reflects the user's intention as guidance.
Our pipeline design allows the model to learn text-conditioned aesthetic cropping with a small dataset.
arXiv Detail & Related papers (2022-11-21T14:27:07Z) - Self-Distilled StyleGAN: Towards Generation from Internet Photos [47.28014076401117]
We show how StyleGAN can be adapted to work on raw uncurated images collected from the Internet.
We propose a StyleGAN-based self-distillation approach, which consists of two main components.
The presented technique enables the generation of high-quality images, while minimizing the loss in diversity of the data.
arXiv Detail & Related papers (2022-02-24T17:16:47Z) - InvGAN: Invertible GANs [88.58338626299837]
InvGAN, short for Invertible GAN, successfully embeds real images to the latent space of a high quality generative model.
This allows us to perform image inpainting, merging, and online data augmentation.
arXiv Detail & Related papers (2021-12-08T21:39:00Z) - Learning Intrinsic Images for Clothing [10.21096394185778]
In this paper, we focus on intrinsic image decomposition for clothing images.
A more interpretable edge-aware metric and an annotation scheme is designed for the testing set.
We show that our proposed model significantly reduce texture-copying artifacts while retaining surprisingly tiny details.
arXiv Detail & Related papers (2021-11-16T14:43:12Z) - Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images.
We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image.
We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z) - RTIC: Residual Learning for Text and Image Composition using Graph
Convolutional Network [19.017377597937617]
We study the compositional learning of images and texts for image retrieval.
We introduce a novel method that combines the graph convolutional network (GCN) with existing composition methods.
arXiv Detail & Related papers (2021-04-07T09:41:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.