Related papers: CLImage: Human-Annotated Datasets for Complementary-Label Learning

CLImage: Human-Annotated Datasets for Complementary-Label Learning

URL: http://arxiv.org/abs/2305.08295v3
Date: Sat, 22 Jun 2024 08:53:38 GMT
Title: CLImage: Human-Annotated Datasets for Complementary-Label Learning
Authors: Hsiu-Hsuan Wang, Tan-Ha Mai, Nai-Xuan Ye, Wei-I Lin, Hsuan-Tien Lin,
Abstract summary: We develop a protocol to collect complementary labels from human annotators. These datasets represent the very first real-world CLL datasets. We discover that the biased-nature of human-annotated complementary labels and the difficulty to validate with only complementary labels are outstanding barriers to practical CLL.
Score: 8.335164415521838
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Complementary-label learning (CLL) is a weakly-supervised learning paradigm that aims to train a multi-class classifier using only complementary labels, which indicate classes to which an instance does not belong. Despite numerous algorithmic proposals for CLL, their practical applicability remains unverified for two reasons. Firstly, these algorithms often rely on assumptions about the generation of complementary labels, and it is not clear how far the assumptions are from reality. Secondly, their evaluation has been limited to synthetic datasets. To gain insights into the real-world performance of CLL algorithms, we developed a protocol to collect complementary labels from human annotators. Our efforts resulted in the creation of four datasets: CLCIFAR10, CLCIFAR20, CLMicroImageNet10, and CLMicroImageNet20, derived from well-known classification datasets CIFAR10, CIFAR100, and TinyImageNet200. These datasets represent the very first real-world CLL datasets. Through extensive benchmark experiments, we discovered a notable decrease in performance when transitioning from synthetic datasets to real-world datasets. We investigated the key factors contributing to the decrease with a thorough dataset-level ablation study. Our analyses highlight annotation noise as the most influential factor in the real-world datasets. In addition, we discover that the biased-nature of human-annotated complementary labels and the difficulty to validate with only complementary labels are two outstanding barriers to practical CLL. These findings suggest that the community focus more research efforts on developing CLL algorithms and validation schemes that are robust to noisy and biased complementary-label distributions.

Related papers

Realistic Evaluation of Deep Partial-Label Learning Algorithms [94.79036193414058]
Partial-label learning (PLL) is a weakly supervised learning problem in which each example is associated with multiple candidate labels and only one is the true label. In recent years, many deep algorithms have been developed to improve model performance. Some early developed algorithms are often underestimated and can outperform many later algorithms with complicated designs.
arXiv Detail & Related papers (2025-02-14T14:22:16Z)
libcll: an Extendable Python Toolkit for Complementary-Label Learning [8.335164415521838]
Complementary-label learning (CLL) is a weakly supervised learning paradigm for multiclass classification. textttlibcll is a Python toolkit for CLL research. textttlibcll provides a universal interface that supports a wide range of generation assumptions.
arXiv Detail & Related papers (2024-11-19T06:56:24Z)
Enhancing Label Sharing Efficiency in Complementary-Label Learning with Label Augmentation [92.4959898591397]
We analyze the implicit sharing of complementary labels on nearby instances during training. We propose a novel technique that enhances the sharing efficiency via complementary-label augmentation. Our results confirm that complementary-label augmentation can systematically improve empirical performance over state-of-the-art CLL models.
arXiv Detail & Related papers (2023-05-15T04:43:14Z)
Complementary Labels Learning with Augmented Classes [22.460256396941528]
Complementary Labels Learning (CLL) arises in many real-world tasks such as private questions classification and online learning. We propose a novel problem setting called Complementary Labels Learning with Augmented Classes (CLLAC) By using unlabeled data, we propose an unbiased estimator of classification risk for CLLAC, which is guaranteed to be provably consistent.
arXiv Detail & Related papers (2022-11-19T13:55:27Z)
Class-Aware Contrastive Semi-Supervised Learning [51.205844705156046]
We propose a general method named Class-aware Contrastive Semi-Supervised Learning (CCSSL) to improve pseudo-label quality and enhance the model's robustness in the real-world setting. Our proposed CCSSL has significant performance improvements over the state-of-the-art SSL methods on the standard datasets CIFAR100 and STL10.
arXiv Detail & Related papers (2022-03-04T12:18:23Z)
The CLEAR Benchmark: Continual LEArning on Real-World Imagery [77.98377088698984]
Continual learning (CL) is widely regarded as crucial challenge for lifelong AI. We introduce CLEAR, the first continual image classification benchmark dataset with a natural temporal evolution of visual concepts. We find that a simple unsupervised pre-training step can already boost state-of-the-art CL algorithms.
arXiv Detail & Related papers (2022-01-17T09:09:09Z)
Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise. This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N) We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z)
SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features. We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z)
ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning [52.831894583501395]
Continual learning assumes the incoming data are fully labeled, which might not be applicable in real applications. We propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN) We show ORDisCo achieves significant performance improvement on various semi-supervised learning benchmark datasets for SSCL.
arXiv Detail & Related papers (2021-01-02T09:04:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.