Unsupervised Multi-label Dataset Generation from Web Data
- URL: http://arxiv.org/abs/2005.05623v1
- Date: Tue, 12 May 2020 08:57:59 GMT
- Title: Unsupervised Multi-label Dataset Generation from Web Data
- Authors: Carlos Roig, David Varas, Issey Masuda, Juan Carlos Riveiro, Elisenda
Bou-Balust
- Abstract summary: This paper presents a system towards the generation of multi-label datasets from web data in an unsupervised manner.
The generation of a single-label dataset uses an unsupervised noise reduction phase (clustering and selection of clusters using anchors) obtaining a 85% of correctly labeled images.
An unsupervised label augmentation process is then performed to assign new labels to the images in the dataset using the class activation maps and the uncertainty associated with each class.
- Score: 2.267916014951237
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a system towards the generation of multi-label datasets
from web data in an unsupervised manner. To achieve this objective, this work
comprises two main contributions, namely: a) the generation of a low-noise
unsupervised single-label dataset from web-data, and b) the augmentation of
labels in such dataset (from single label to multi label). The generation of a
single-label dataset uses an unsupervised noise reduction phase (clustering and
selection of clusters using anchors) obtaining a 85% of correctly labeled
images. An unsupervised label augmentation process is then performed to assign
new labels to the images in the dataset using the class activation maps and the
uncertainty associated with each class. This process is applied to the dataset
generated in this paper and a public dataset (Places365) achieving a 9.5% and
27% of extra labels in each dataset respectively, therefore demonstrating that
the presented system can robustly enrich the initial dataset.
Related papers
- Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - TransPOS: Transformers for Consolidating Different POS Tagset Datasets [0.8399688944263843]
This paper considers two datasets that label part-of-speech (POS) tags under different tagging schemes.
It proposes a novel supervised architecture employing Transformers to tackle the problem of consolidating two completely disjoint datasets.
arXiv Detail & Related papers (2022-09-24T08:43:53Z) - OSSGAN: Open-Set Semi-Supervised Image Generation [26.67298827670573]
We introduce a challenging training scheme of conditional GANs, called open-set semi-supervised image generation.
OSSGAN provides decision clues to the discriminator on the basis of whether an unlabeled image belongs to one or none of the classes of interest.
The results of experiments on Tiny ImageNet and ImageNet show notable improvements over supervised BigGAN and semi-supervised methods.
arXiv Detail & Related papers (2022-04-29T17:26:09Z) - Learning Semantic Segmentation from Multiple Datasets with Label Shifts [101.24334184653355]
This paper proposes UniSeg, an effective approach to automatically train models across multiple datasets with differing label spaces.
Specifically, we propose two losses that account for conflicting and co-occurring labels to achieve better generalization performance in unseen domains.
arXiv Detail & Related papers (2022-02-28T18:55:19Z) - GuidedMix-Net: Semi-supervised Semantic Segmentation by Using Labeled
Images as Reference [90.5402652758316]
We propose a novel method for semi-supervised semantic segmentation named GuidedMix-Net.
It uses labeled information to guide the learning of unlabeled instances.
It achieves competitive segmentation accuracy and significantly improves the mIoU by +7$%$ compared to previous approaches.
arXiv Detail & Related papers (2021-12-28T06:48:03Z) - GuidedMix-Net: Learning to Improve Pseudo Masks Using Labeled Images as
Reference [153.354332374204]
We propose a novel method for semi-supervised semantic segmentation named GuidedMix-Net.
We first introduce a feature alignment objective between labeled and unlabeled data to capture potentially similar image pairs.
MITrans is shown to be a powerful knowledge module for further progressive refining features of unlabeled data.
Along with supervised learning for labeled data, the prediction of unlabeled data is jointly learned with the generated pseudo masks.
arXiv Detail & Related papers (2021-06-29T02:48:45Z) - Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier.
An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z) - openXDATA: A Tool for Multi-Target Data Generation and Missing Label
Completion [23.14045574165086]
A common problem in machine learning is to deal with datasets with disjoint label spaces and missing labels.
In this work, we introduce the openXdata tool that completes the missing labels in partially labelled or unlabelled datasets.
We show the ability to estimate both categories and continuous labels for all of the datasets, at rates that approached the ground truth values.
arXiv Detail & Related papers (2020-07-27T22:05:53Z) - Multi-Level Generative Models for Partial Label Learning with Non-random
Label Noise [47.01917619550429]
We propose a novel multi-level generative model for partial label learning (MGPLL)
It learns both a label level adversarial generator and a feature level adversarial generator under a bi-directional mapping framework.
The proposed approach demonstrates the state-of-the-art performance for partial label learning.
arXiv Detail & Related papers (2020-05-11T20:13:19Z) - Automatically Discovering and Learning New Visual Categories with
Ranking Statistics [145.89790963544314]
We tackle the problem of discovering novel classes in an image collection given labelled examples of other classes.
We learn a general-purpose clustering model and use the latter to identify the new classes in the unlabelled data.
We evaluate our approach on standard classification benchmarks and outperform current methods for novel category discovery by a significant margin.
arXiv Detail & Related papers (2020-02-13T18:53:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.