VICTOR: Visual Incompatibility Detection with Transformers and
Fashion-specific contrastive pre-training
- URL: http://arxiv.org/abs/2207.13458v1
- Date: Wed, 27 Jul 2022 11:18:55 GMT
- Title: VICTOR: Visual Incompatibility Detection with Transformers and
Fashion-specific contrastive pre-training
- Authors: Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos,
Ioannis Kompatsiaris
- Abstract summary: Visual InCompatibility TransfORmer (VICTOR) is optimized for two tasks: 1) overall compatibility as regression and 2) the detection of mismatching items.
We build upon the Polyvore outfit benchmark to generate partially mismatching outfits, creating a new dataset termed Polyvore-MISFITs.
A series of ablation and comparative analyses show that the proposed architecture can compete and even surpass the current state-of-the-art on Polyvore datasets.
- Score: 18.753508811614644
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In order to consider fashion outfits as aesthetically pleasing, the garments
that constitute them need to be compatible in terms of visual aspects, such as
style, category and color. With the advent and omnipresence of computer vision
deep learning models, increased interest has also emerged for the task of
visual compatibility detection with the aim to develop quality fashion outfit
recommendation systems. Previous works have defined visual compatibility as a
binary classification task with items in a garment being considered as fully
compatible or fully incompatible. However, this is not applicable to Outfit
Maker applications where users create their own outfits and need to know which
specific items may be incompatible with the rest of the outfit. To address
this, we propose the Visual InCompatibility TransfORmer (VICTOR) that is
optimized for two tasks: 1) overall compatibility as regression and 2) the
detection of mismatching items. Unlike previous works that either rely on
feature extraction from ImageNet-pretrained models or by end-to-end fine
tuning, we utilize fashion-specific contrastive language-image pre-training for
fine tuning computer vision neural networks on fashion imagery. Moreover, we
build upon the Polyvore outfit benchmark to generate partially mismatching
outfits, creating a new dataset termed Polyvore-MISFITs, that is used to train
VICTOR. A series of ablation and comparative analyses show that the proposed
architecture can compete and even surpass the current state-of-the-art on
Polyvore datasets while reducing the instance-wise floating operations by 88%,
striking a balance between high performance and efficiency.
Related papers
- IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions.
IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE.
We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z) - Fashion Recommendation: Outfit Compatibility using GNN [0.0]
We follow two existing approaches that employ graphs to represent outfits.
Both Node-wise Graph Neural Network (NGNN) and Hypergraph Neural Network aim to score a set of items according to the outfit compatibility of items.
We recreate the analysis on a subset of this data and compare the two existing models on their performance on two tasks Fill in the blank (FITB): finding an item that completes an outfit, and Compatibility prediction: estimating compatibility of different items grouped as an outfit.
arXiv Detail & Related papers (2024-04-28T00:57:17Z) - MV-VTON: Multi-View Virtual Try-On with Diffusion Models [91.71150387151042]
The goal of image-based virtual try-on is to generate an image of the target person naturally wearing the given clothing.
Existing methods solely focus on the frontal try-on using the frontal clothing.
We introduce Multi-View Virtual Try-ON (MV-VTON), which aims to reconstruct the dressing results from multiple views using the given clothes.
arXiv Detail & Related papers (2024-04-26T12:27:57Z) - Transformer-based Graph Neural Networks for Outfit Generation [22.86041284499166]
TGNN exploits multi-headed self attention to capture relations between clothing items in a graph as a message passing step in Convolutional Graph Neural Networks.
We propose a transformer-based architecture, which exploits multi-headed self attention to capture relations between clothing items in a graph as a message passing step in Convolutional Graph Neural Networks.
arXiv Detail & Related papers (2023-04-17T09:18:45Z) - FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified
Retrieval and Captioning [66.38951790650887]
Multimodal tasks in the fashion domain have significant potential for e-commerce.
We propose a novel fashion-specific pre-training framework based on weakly-supervised triplets constructed from fashion image-text pairs.
We show the triplet-based tasks are an effective addition to standard multimodal pre-training tasks.
arXiv Detail & Related papers (2022-10-26T21:01:19Z) - Learning Fashion Compatibility from In-the-wild Images [6.591937706757015]
We propose to learn representations for compatibility prediction from in-the-wild street fashion images through self-supervised learning.
Our pretext task is formulated such that the representations of different items worn by the same person are closer compared to those worn by other people.
We conduct experiments on two popular fashion compatibility benchmarks - Polyvore and Polyvore-Disjoint outfits.
arXiv Detail & Related papers (2022-06-13T09:05:25Z) - OutfitTransformer: Learning Outfit Representations for Fashion
Recommendation [6.890771095769622]
We present a framework, OutfitTransformer, that learns effective outfit-level representations encoding the compatibility relationships between all items in the entire outfit.
For compatibility prediction, we design an outfit token to capture a global outfit representation and train the framework using a classification loss.
For complementary item retrieval, we design a target item token that additionally takes the target item specification into consideration.
The generated target item embedding is then used to retrieve compatible items that match the rest of the outfit.
arXiv Detail & Related papers (2022-04-11T00:55:40Z) - Arbitrary Virtual Try-On Network: Characteristics Preservation and
Trade-off between Body and Clothing [85.74977256940855]
We propose an Arbitrary Virtual Try-On Network (AVTON) for all-type clothes.
AVTON can synthesize realistic try-on images by preserving and trading off characteristics of the target clothes and the reference person.
Our approach can achieve better performance compared with the state-of-the-art virtual try-on methods.
arXiv Detail & Related papers (2021-11-24T08:59:56Z) - Cloth Interactive Transformer for Virtual Try-On [106.21605249649957]
We propose a novel two-stage cloth interactive transformer (CIT) method for the virtual try-on task.
In the first stage, we design a CIT matching block, aiming to precisely capture the long-range correlations between the cloth-agnostic person information and the in-shop cloth information.
In the second stage, we put forth a CIT reasoning block for establishing global mutual interactive dependencies among person representation, the warped clothing item, and the corresponding warped cloth mask.
arXiv Detail & Related papers (2021-04-12T14:45:32Z) - Fashion Recommendation and Compatibility Prediction Using Relational
Network [18.13692056232815]
We develop a Relation Network (RN) to develop new compatibility learning models.
FashionRN learns the compatibility of an entire outfit, with an arbitrary number of items, in an arbitrary order.
We evaluate our model using a large dataset of 49,740 outfits that we collected from Polyvore website.
arXiv Detail & Related papers (2020-05-13T21:00:54Z) - Learning Diverse Fashion Collocation by Neural Graph Filtering [78.9188246136867]
We propose a novel fashion collocation framework, Neural Graph Filtering, that models a flexible set of fashion items via a graph neural network.
By applying symmetric operations on the edge vectors, this framework allows varying numbers of inputs/outputs and is invariant to their ordering.
We evaluate the proposed approach on three popular benchmarks, the Polyvore dataset, the Polyvore-D dataset, and our reorganized Amazon Fashion dataset.
arXiv Detail & Related papers (2020-03-11T16:17:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.