Semi-supervised Learning with a Teacher-student Network for Generalized
Attribute Prediction
- URL: http://arxiv.org/abs/2007.06769v1
- Date: Tue, 14 Jul 2020 02:06:24 GMT
- Title: Semi-supervised Learning with a Teacher-student Network for Generalized
Attribute Prediction
- Authors: Minchul Shin
- Abstract summary: This paper presents a study on semi-supervised learning to solve the visual attribute prediction problem.
Our method achieves competitive performance on various benchmarks for fashion attribute prediction.
- Score: 7.462336024223667
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a study on semi-supervised learning to solve the visual
attribute prediction problem. In many applications of vision algorithms, the
precise recognition of visual attributes of objects is important but still
challenging. This is because defining a class hierarchy of attributes is
ambiguous, so training data inevitably suffer from class imbalance and label
sparsity, leading to a lack of effective annotations. An intuitive solution is
to find a method to effectively learn image representations by utilizing
unlabeled images. With that in mind, we propose a multi-teacher-single-student
(MTSS) approach inspired by the multi-task learning and the distillation of
semi-supervised learning. Our MTSS learns task-specific domain experts called
teacher networks using the label embedding technique and learns a unified model
called a student network by forcing a model to mimic the distributions learned
by domain experts. Our experiments demonstrate that our method not only
achieves competitive performance on various benchmarks for fashion attribute
prediction, but also improves robustness and cross-domain adaptability for
unseen domains.
Related papers
- A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - Heuristic Vision Pre-Training with Self-Supervised and Supervised
Multi-Task Learning [0.0]
We propose a novel pre-training framework by adopting both self-supervised and supervised visual pre-text tasks in a multi-task manner.
Results show that our pre-trained models can deliver results on par with or better than state-of-the-art (SOTA) results on multiple visual tasks.
arXiv Detail & Related papers (2023-10-11T14:06:04Z) - Mixture of Self-Supervised Learning [2.191505742658975]
Self-supervised learning works by using a pretext task which will be trained on the model before being applied to a specific task.
Previous studies have only used one type of transformation as a pretext task.
This raises the question of how it affects if more than one pretext task is used and to use a gating network to combine all pretext tasks.
arXiv Detail & Related papers (2023-07-27T14:38:32Z) - Learning Transferable Pedestrian Representation from Multimodal
Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information.
We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations.
We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z) - DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning [37.48292304239107]
We present a transformer-based end-to-end ZSL method named DUET.
We develop a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images.
We find that DUET can often achieve state-of-the-art performance, its components are effective and its predictions are interpretable.
arXiv Detail & Related papers (2022-07-04T11:12:12Z) - Attribute Prototype Network for Any-Shot Learning [113.50220968583353]
We argue that an image representation with integrated attribute localization ability would be beneficial for any-shot, i.e. zero-shot and few-shot, image classification tasks.
We propose a novel representation learning framework that jointly learns global and local features using only class-level attributes.
arXiv Detail & Related papers (2022-04-04T02:25:40Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z) - Can Semantic Labels Assist Self-Supervised Visual Representation
Learning? [194.1681088693248]
We present a new algorithm named Supervised Contrastive Adjustment in Neighborhood (SCAN)
In a series of downstream tasks, SCAN achieves superior performance compared to previous fully-supervised and self-supervised methods.
Our study reveals that semantic labels are useful in assisting self-supervised methods, opening a new direction for the community.
arXiv Detail & Related papers (2020-11-17T13:25:00Z) - Region Comparison Network for Interpretable Few-shot Image
Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes.
We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works.
We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z) - Unsupervised Domain Attention Adaptation Network for Caricature
Attribute Recognition [23.95731281719786]
Caricature attributes provide distinctive facial features to help research in Psychology and Neuroscience.
Unlike the facial photo attribute datasets that have a quantity of annotated images, the annotations of caricature attributes are rare.
We propose a caricature attribute dataset, namely WebCariA, to facility the research in attribute learning of caricatures.
arXiv Detail & Related papers (2020-07-18T06:38:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.