OutfitTransformer: Learning Outfit Representations for Fashion
Recommendation
- URL: http://arxiv.org/abs/2204.04812v1
- Date: Mon, 11 Apr 2022 00:55:40 GMT
- Title: OutfitTransformer: Learning Outfit Representations for Fashion
Recommendation
- Authors: Rohan Sarkar, Navaneeth Bodla, Mariya Vasileva, Yen-Liang Lin, Anurag
Beniwal, Alan Lu, Gerard Medioni
- Abstract summary: We present a framework, OutfitTransformer, that learns effective outfit-level representations encoding the compatibility relationships between all items in the entire outfit.
For compatibility prediction, we design an outfit token to capture a global outfit representation and train the framework using a classification loss.
For complementary item retrieval, we design a target item token that additionally takes the target item specification into consideration.
The generated target item embedding is then used to retrieve compatible items that match the rest of the outfit.
- Score: 6.890771095769622
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning an effective outfit-level representation is critical for predicting
the compatibility of items in an outfit, and retrieving complementary items for
a partial outfit. We present a framework, OutfitTransformer, that uses the
proposed task-specific tokens and leverages the self-attention mechanism to
learn effective outfit-level representations encoding the compatibility
relationships between all items in the entire outfit for addressing both
compatibility prediction and complementary item retrieval tasks. For
compatibility prediction, we design an outfit token to capture a global outfit
representation and train the framework using a classification loss. For
complementary item retrieval, we design a target item token that additionally
takes the target item specification (in the form of a category or text
description) into consideration. We train our framework using a proposed
set-wise outfit ranking loss to generate a target item embedding given an
outfit, and a target item specification as inputs. The generated target item
embedding is then used to retrieve compatible items that match the rest of the
outfit. Additionally, we adopt a pre-training approach and a curriculum
learning strategy to improve retrieval performance. Since our framework learns
at an outfit-level, it allows us to learn a single embedding capturing
higher-order relations among multiple items in the outfit more effectively than
pairwise methods. Experiments demonstrate that our approach outperforms
state-of-the-art methods on compatibility prediction, fill-in-the-blank, and
complementary item retrieval tasks. We further validate the quality of our
retrieval results with a user study.
Related papers
- GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections [63.82168065819053]
GarmentAligner is a text-to-garment diffusion model trained with retrieval-augmented multi-level corrections.
To achieve semantic alignment at the component level, we introduce an automatic component extraction pipeline.
To exploit component relationships within the garment images, we construct retrieval subsets for each garment.
arXiv Detail & Related papers (2024-08-22T12:50:45Z) - Clothes-Changing Person Re-Identification with Feasibility-Aware Intermediary Matching [86.04494755636613]
Current clothes-changing person re-identification (re-id) approaches usually perform retrieval based on clothes-irrelevant features.
We propose a Feasibility-Aware Intermediary Matching (FAIM) framework to additionally utilize clothes-relevant features for retrieval.
Our method outperforms state-of-the-art methods on several widely-used clothes-changing re-id benchmarks.
arXiv Detail & Related papers (2024-04-15T06:58:09Z) - Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models [64.24227572048075]
We propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models.
Our approach takes inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects.
arXiv Detail & Related papers (2023-08-22T04:24:45Z) - Transformer-based Graph Neural Networks for Outfit Generation [22.86041284499166]
TGNN exploits multi-headed self attention to capture relations between clothing items in a graph as a message passing step in Convolutional Graph Neural Networks.
We propose a transformer-based architecture, which exploits multi-headed self attention to capture relations between clothing items in a graph as a message passing step in Convolutional Graph Neural Networks.
arXiv Detail & Related papers (2023-04-17T09:18:45Z) - VICTOR: Visual Incompatibility Detection with Transformers and
Fashion-specific contrastive pre-training [18.753508811614644]
Visual InCompatibility TransfORmer (VICTOR) is optimized for two tasks: 1) overall compatibility as regression and 2) the detection of mismatching items.
We build upon the Polyvore outfit benchmark to generate partially mismatching outfits, creating a new dataset termed Polyvore-MISFITs.
A series of ablation and comparative analyses show that the proposed architecture can compete and even surpass the current state-of-the-art on Polyvore datasets.
arXiv Detail & Related papers (2022-07-27T11:18:55Z) - Learning Fashion Compatibility from In-the-wild Images [6.591937706757015]
We propose to learn representations for compatibility prediction from in-the-wild street fashion images through self-supervised learning.
Our pretext task is formulated such that the representations of different items worn by the same person are closer compared to those worn by other people.
We conduct experiments on two popular fashion compatibility benchmarks - Polyvore and Polyvore-Disjoint outfits.
arXiv Detail & Related papers (2022-06-13T09:05:25Z) - Fashionformer: A simple, Effective and Unified Baseline for Human
Fashion Segmentation and Recognition [80.74495836502919]
In this work, we focus on joint human fashion segmentation and attribute recognition.
We introduce the object query for segmentation and the attribute query for attribute prediction.
For attribute stream, we design a novel Multi-Layer Rendering module to explore more fine-grained features.
arXiv Detail & Related papers (2022-04-10T11:11:10Z) - Semi-Supervised Visual Representation Learning for Fashion Compatibility [17.893627646979038]
We propose a semi-supervised learning approach to create pseudo-positive and pseudo-negative outfits on the fly during training.
For each labeled outfit in a training batch, we obtain a pseudo-outfit by matching each item in the labeled outfit with unlabeled items.
We conduct extensive experiments on Polyvore, Polyvore-D and our newly created large-scale Fashion Outfits datasets.
arXiv Detail & Related papers (2021-09-16T15:35:38Z) - Dynamic Semantic Matching and Aggregation Network for Few-shot Intent
Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances.
Semantic components are distilled from utterances via multi-head self-attention.
Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z) - Fashion Recommendation and Compatibility Prediction Using Relational
Network [18.13692056232815]
We develop a Relation Network (RN) to develop new compatibility learning models.
FashionRN learns the compatibility of an entire outfit, with an arbitrary number of items, in an arbitrary order.
We evaluate our model using a large dataset of 49,740 outfits that we collected from Polyvore website.
arXiv Detail & Related papers (2020-05-13T21:00:54Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.