Self-supervised Visual Attribute Learning for Fashion Compatibility
- URL: http://arxiv.org/abs/2008.00348v2
- Date: Thu, 12 Aug 2021 01:22:33 GMT
- Title: Self-supervised Visual Attribute Learning for Fashion Compatibility
- Authors: Donghyun Kim, Kuniaki Saito, Samarth Mishra, Stan Sclaroff, Kate
Saenko, Bryan A Plummer
- Abstract summary: We present an SSL framework that enables us to learn color and texture-aware features without requiring any labels during training.
Our approach consists of three self-supervised tasks designed to capture different concepts that are neglected in prior work.
We show that our approach can be used for transfer learning, demonstrating that we can train on one dataset while achieving high performance on a different dataset.
- Score: 71.73414832639698
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many self-supervised learning (SSL) methods have been successful in learning
semantically meaningful visual representations by solving pretext tasks.
However, prior work in SSL focuses on tasks like object recognition or
detection, which aim to learn object shapes and assume that the features should
be invariant to concepts like colors and textures. Thus, these SSL methods
perform poorly on downstream tasks where these concepts provide critical
information. In this paper, we present an SSL framework that enables us to
learn color and texture-aware features without requiring any labels during
training. Our approach consists of three self-supervised tasks designed to
capture different concepts that are neglected in prior work that we can select
from depending on the needs of our downstream tasks. Our tasks include learning
to predict color histograms and discriminate shapeless local patches and
textures from each instance. We evaluate our approach on fashion compatibility
using Polyvore Outfits and In-Shop Clothing Retrieval using Deepfashion,
improving upon prior SSL methods by 9.5-16%, and even outperforming some
supervised approaches on Polyvore Outfits despite using no labels. We also show
that our approach can be used for transfer learning, demonstrating that we can
train on one dataset while achieving high performance on a different dataset.
Related papers
- Heuristic Vision Pre-Training with Self-Supervised and Supervised
Multi-Task Learning [0.0]
We propose a novel pre-training framework by adopting both self-supervised and supervised visual pre-text tasks in a multi-task manner.
Results show that our pre-trained models can deliver results on par with or better than state-of-the-art (SOTA) results on multiple visual tasks.
arXiv Detail & Related papers (2023-10-11T14:06:04Z) - Mixture of Self-Supervised Learning [2.191505742658975]
Self-supervised learning works by using a pretext task which will be trained on the model before being applied to a specific task.
Previous studies have only used one type of transformation as a pretext task.
This raises the question of how it affects if more than one pretext task is used and to use a gating network to combine all pretext tasks.
arXiv Detail & Related papers (2023-07-27T14:38:32Z) - Augmentation-aware Self-supervised Learning with Conditioned Projector [6.720605329045581]
Self-supervised learning (SSL) is a powerful technique for learning from unlabeled data.
We propose to foster sensitivity to characteristics in the representation space by modifying the projector network.
Our approach, coined Conditional Augmentation-aware Self-supervised Learning (CASSLE), is directly applicable to typical joint-embedding SSL methods.
arXiv Detail & Related papers (2023-05-31T12:24:06Z) - A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends [82.64268080902742]
Self-supervised learning (SSL) aims to learn discriminative features from unlabeled data without relying on human-annotated labels.
SSL has garnered significant attention recently, leading to the development of numerous related algorithms.
This paper presents a review of diverse SSL methods, encompassing algorithmic aspects, application domains, three key trends, and open research questions.
arXiv Detail & Related papers (2023-01-13T14:41:05Z) - Confidence-Aware Paced-Curriculum Learning by Label Smoothing for
Surgical Scene Understanding [33.62888947753327]
We design a curriculum by label smoothing (P-CBLS) using paced learning with uniform label smoothing (ULS) for classification tasks and fuse uniform and spatially varying label smoothing (SVLS) for semantic segmentation tasks in a curriculum manner.
We set a bigger smoothing value at the beginning of training and gradually decreased it to zero to control the model learning utility from lower to higher.
The proposed techniques are validated on four robotic surgery datasets.
arXiv Detail & Related papers (2022-12-22T07:19:15Z) - Understanding and Improving the Role of Projection Head in
Self-Supervised Learning [77.59320917894043]
Self-supervised learning (SSL) aims to produce useful feature representations without access to human-labeled data annotations.
Current contrastive learning approaches append a parametrized projection head to the end of some backbone network to optimize the InfoNCE objective.
This raises a fundamental question: Why is a learnable projection head required if we are to discard it after training?
arXiv Detail & Related papers (2022-12-22T05:42:54Z) - UniVIP: A Unified Framework for Self-Supervised Visual Pre-training [50.87603616476038]
We propose a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset.
Massive experiments show that UniVIP pre-trained on non-iconic COCO achieves state-of-the-art transfer performance.
Our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2.5% with the same pre-training epochs in linear probing.
arXiv Detail & Related papers (2022-03-14T10:04:04Z) - A Tale of Color Variants: Representation and Self-Supervised Learning in
Fashion E-Commerce [2.3449131636069898]
We propose a generic framework, that leverages deep visual Representation Learning at its heart, to address this problem for our fashion e-commerce platform.
Our framework could be trained with supervisory signals in the form of triplets, that are obtained manually.
But, to our rescue, interestingly we observed that this crucial problem in fashion e-commerce could also be solved by simple color jitter based image augmentation.
arXiv Detail & Related papers (2021-12-06T10:24:54Z) - Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition [101.59989523028264]
Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks.
We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
arXiv Detail & Related papers (2021-07-16T07:46:41Z) - Learning Invariant Representations for Reinforcement Learning without
Reconstruction [98.33235415273562]
We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction.
Bisimulation metrics quantify behavioral similarity between states in continuous MDPs.
We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks.
arXiv Detail & Related papers (2020-06-18T17:59:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.