Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of
Semi-Supervised Learning and Active Learning
- URL: http://arxiv.org/abs/2206.03288v1
- Date: Tue, 7 Jun 2022 13:28:43 GMT
- Title: Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of
Semi-Supervised Learning and Active Learning
- Authors: Jiannan Guo, Yangyang Kang, Yu Duan, Xiaozhong Liu, Siliang Tang,
Wenqiao Zhang, Kun Kuang, Changlong Sun, Fei Wu
- Abstract summary: Active learning (AL) and semi-supervised learning (SSL) are two effective, but often isolated, means to alleviate the data-hungry problem.
We propose an innovative Inconsistency-based virtual aDvErial algorithm to further investigate SSL-AL's potential superiority.
Two real-world case studies visualize the practical industrial value of applying and deploying the proposed data sampling algorithm.
- Score: 60.26659373318915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While annotating decent amounts of data to satisfy sophisticated learning
models can be cost-prohibitive for many real-world applications. Active
learning (AL) and semi-supervised learning (SSL) are two effective, but often
isolated, means to alleviate the data-hungry problem. Some recent studies
explored the potential of combining AL and SSL to better probe the unlabeled
data. However, almost all these contemporary SSL-AL works use a simple
combination strategy, ignoring SSL and AL's inherent relation. Further, other
methods suffer from high computational costs when dealing with large-scale,
high-dimensional datasets. Motivated by the industry practice of labeling data,
we propose an innovative Inconsistency-based virtual aDvErsarial Active
Learning (IDEAL) algorithm to further investigate SSL-AL's potential
superiority and achieve mutual enhancement of AL and SSL, i.e., SSL propagates
label information to unlabeled samples and provides smoothed embeddings for AL,
while AL excludes samples with inconsistent predictions and considerable
uncertainty for SSL. We estimate unlabeled samples' inconsistency by
augmentation strategies of different granularities, including fine-grained
continuous perturbation exploration and coarse-grained data transformations.
Extensive experiments, in both text and image domains, validate the
effectiveness of the proposed algorithm, comparing it against state-of-the-art
baselines. Two real-world case studies visualize the practical industrial value
of applying and deploying the proposed data sampling algorithm.
Related papers
- Can semi-supervised learning use all the data effectively? A lower bound
perspective [58.71657561857055]
We show that semi-supervised learning algorithms can leverage unlabeled data to improve over the labeled sample complexity of supervised learning algorithms.
Our work suggests that, while proving performance gains for SSL algorithms is possible, it requires careful tracking of constants.
arXiv Detail & Related papers (2023-11-30T13:48:50Z) - How To Overcome Confirmation Bias in Semi-Supervised Image
Classification By Active Learning [2.1805442504863506]
We present three data challenges common in real-world applications: between-class imbalance, within-class imbalance, and between-class similarity.
We find that random sampling does not mitigate confirmation bias and, in some cases, leads to worse performance than supervised learning.
Our results provide insights into the potential of combining active and semi-supervised learning in the presence of common real-world challenges.
arXiv Detail & Related papers (2023-08-16T08:52:49Z) - Exploration and Exploitation of Unlabeled Data for Open-Set
Semi-Supervised Learning [130.56124475528475]
We address a complex but practical scenario in semi-supervised learning (SSL) named open-set SSL, where unlabeled data contain both in-distribution (ID) and out-of-distribution (OOD) samples.
Our proposed method achieves state-of-the-art in several challenging benchmarks, and improves upon existing SSL methods even when ID samples are totally absent in unlabeled data.
arXiv Detail & Related papers (2023-06-30T14:25:35Z) - Active Semi-Supervised Learning by Exploring Per-Sample Uncertainty and
Consistency [30.94964727745347]
We propose a method called Active Semi-supervised Learning (ASSL) to improve accuracy of models at a lower cost.
ASSL involves more dynamic model updates than Active Learning (AL) due to the use of unlabeled data.
ASSL achieved about 5.3 times higher computational efficiency than Semi-supervised Learning (SSL) while achieving the same performance.
arXiv Detail & Related papers (2023-03-15T22:58:23Z) - OpenLDN: Learning to Discover Novel Classes for Open-World
Semi-Supervised Learning [110.40285771431687]
Semi-supervised learning (SSL) is one of the dominant approaches to address the annotation bottleneck of supervised learning.
Recent SSL methods can effectively leverage a large repository of unlabeled data to improve performance while relying on a small set of labeled data.
This work introduces OpenLDN that utilizes a pairwise similarity loss to discover novel classes.
arXiv Detail & Related papers (2022-07-05T18:51:05Z) - Unlabeled Data Help: Minimax Analysis and Adversarial Robustness [21.79888306754263]
Self-supervised learning (SSL) approaches successfully demonstrate the great potential of supplementing learning algorithms with additional unlabeled data.
It is still unclear whether the existing SSL algorithms can fully utilize the information of both labelled and unlabeled data.
This paper gives an affirmative answer for the reconstruction-based SSL algorithm citeplee 2020predicting under several statistical models.
arXiv Detail & Related papers (2022-02-14T19:24:43Z) - Self-supervised Learning is More Robust to Dataset Imbalance [65.84339596595383]
We investigate self-supervised learning under dataset imbalance.
Off-the-shelf self-supervised representations are already more robust to class imbalance than supervised representations.
We devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets.
arXiv Detail & Related papers (2021-10-11T06:29:56Z) - Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.
We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning.
Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z) - Relieving the Plateau: Active Semi-Supervised Learning for a Better
Landscape [2.3046646540823916]
Semi-supervised learning (SSL) leverages unlabeled data that are more accessible than their labeled counterparts.
Active learning (AL) selects unlabeled instances to be annotated by a human-in-the-loop in hopes of better performance with less labeled data.
We propose convergence rate control (CRC), an AL algorithm that selects unlabeled data to improve the problem conditioning upon inclusion to the labeled set.
arXiv Detail & Related papers (2021-04-08T06:03:59Z) - Matching Distributions via Optimal Transport for Semi-Supervised
Learning [31.533832244923843]
Semi-Supervised Learning (SSL) approaches have been an influential framework for the usage of unlabeled data.
We propose a new approach that adopts an Optimal Transport (OT) technique serving as a metric of similarity between discrete empirical probability measures.
We have evaluated our proposed method with state-of-the-art SSL algorithms on standard datasets to demonstrate the superiority and effectiveness of our SSL algorithm.
arXiv Detail & Related papers (2020-12-04T11:15:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.