Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of
Semi-Supervised Learning and Active Learning
- URL: http://arxiv.org/abs/2206.03288v1
- Date: Tue, 7 Jun 2022 13:28:43 GMT
- Title: Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of
Semi-Supervised Learning and Active Learning
- Authors: Jiannan Guo, Yangyang Kang, Yu Duan, Xiaozhong Liu, Siliang Tang,
Wenqiao Zhang, Kun Kuang, Changlong Sun, Fei Wu
- Abstract summary: Active learning (AL) and semi-supervised learning (SSL) are two effective, but often isolated, means to alleviate the data-hungry problem.
We propose an innovative Inconsistency-based virtual aDvErial algorithm to further investigate SSL-AL's potential superiority.
Two real-world case studies visualize the practical industrial value of applying and deploying the proposed data sampling algorithm.
- Score: 60.26659373318915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While annotating decent amounts of data to satisfy sophisticated learning
models can be cost-prohibitive for many real-world applications. Active
learning (AL) and semi-supervised learning (SSL) are two effective, but often
isolated, means to alleviate the data-hungry problem. Some recent studies
explored the potential of combining AL and SSL to better probe the unlabeled
data. However, almost all these contemporary SSL-AL works use a simple
combination strategy, ignoring SSL and AL's inherent relation. Further, other
methods suffer from high computational costs when dealing with large-scale,
high-dimensional datasets. Motivated by the industry practice of labeling data,
we propose an innovative Inconsistency-based virtual aDvErsarial Active
Learning (IDEAL) algorithm to further investigate SSL-AL's potential
superiority and achieve mutual enhancement of AL and SSL, i.e., SSL propagates
label information to unlabeled samples and provides smoothed embeddings for AL,
while AL excludes samples with inconsistent predictions and considerable
uncertainty for SSL. We estimate unlabeled samples' inconsistency by
augmentation strategies of different granularities, including fine-grained
continuous perturbation exploration and coarse-grained data transformations.
Extensive experiments, in both text and image domains, validate the
effectiveness of the proposed algorithm, comparing it against state-of-the-art
baselines. Two real-world case studies visualize the practical industrial value
of applying and deploying the proposed data sampling algorithm.
Related papers
- Breaking the SSL-AL Barrier: A Synergistic Semi-Supervised Active Learning Framework for 3D Object Detection [34.049483237480615]
We propose a Synergistic Semi-Supervised Active Learning framework, dubbed as S-SSAL.
We show that S-SSAL can achieve performance comparable to models trained on the full dataset.
arXiv Detail & Related papers (2025-01-26T08:43:59Z) - SeMi: When Imbalanced Semi-Supervised Learning Meets Mining Hard Examples [54.760757107700755]
Semi-Supervised Learning (SSL) can leverage abundant unlabeled data to boost model performance.
The class-imbalanced data distribution in real-world scenarios poses great challenges to SSL, resulting in performance degradation.
We propose a method that enhances the performance of Imbalanced Semi-Supervised Learning by Mining Hard Examples (SeMi)
arXiv Detail & Related papers (2025-01-10T14:35:16Z) - Co-Learning: Towards Semi-Supervised Object Detection with Road-side Cameras [1.5495593104596401]
Semi-supervised learning (SSL) can train object detectors using labeled and unlabeled data.
SSL faces several challenges, including pseudo-target inconsistencies, disharmony between classification and regression tasks, and efficient use of abundant unlabeled data.
We develop a teacher-student-based SSL framework, Co-Learning, which employs mutual learning and annotation-alignment strategies to adeptly navigate these complexities.
arXiv Detail & Related papers (2024-11-28T13:42:55Z) - Can semi-supervised learning use all the data effectively? A lower bound
perspective [58.71657561857055]
We show that semi-supervised learning algorithms can leverage unlabeled data to improve over the labeled sample complexity of supervised learning algorithms.
Our work suggests that, while proving performance gains for SSL algorithms is possible, it requires careful tracking of constants.
arXiv Detail & Related papers (2023-11-30T13:48:50Z) - How To Overcome Confirmation Bias in Semi-Supervised Image
Classification By Active Learning [2.1805442504863506]
We present three data challenges common in real-world applications: between-class imbalance, within-class imbalance, and between-class similarity.
We find that random sampling does not mitigate confirmation bias and, in some cases, leads to worse performance than supervised learning.
Our results provide insights into the potential of combining active and semi-supervised learning in the presence of common real-world challenges.
arXiv Detail & Related papers (2023-08-16T08:52:49Z) - Exploration and Exploitation of Unlabeled Data for Open-Set
Semi-Supervised Learning [130.56124475528475]
We address a complex but practical scenario in semi-supervised learning (SSL) named open-set SSL, where unlabeled data contain both in-distribution (ID) and out-of-distribution (OOD) samples.
Our proposed method achieves state-of-the-art in several challenging benchmarks, and improves upon existing SSL methods even when ID samples are totally absent in unlabeled data.
arXiv Detail & Related papers (2023-06-30T14:25:35Z) - Active Semi-Supervised Learning by Exploring Per-Sample Uncertainty and
Consistency [30.94964727745347]
We propose a method called Active Semi-supervised Learning (ASSL) to improve accuracy of models at a lower cost.
ASSL involves more dynamic model updates than Active Learning (AL) due to the use of unlabeled data.
ASSL achieved about 5.3 times higher computational efficiency than Semi-supervised Learning (SSL) while achieving the same performance.
arXiv Detail & Related papers (2023-03-15T22:58:23Z) - OpenLDN: Learning to Discover Novel Classes for Open-World
Semi-Supervised Learning [110.40285771431687]
Semi-supervised learning (SSL) is one of the dominant approaches to address the annotation bottleneck of supervised learning.
Recent SSL methods can effectively leverage a large repository of unlabeled data to improve performance while relying on a small set of labeled data.
This work introduces OpenLDN that utilizes a pairwise similarity loss to discover novel classes.
arXiv Detail & Related papers (2022-07-05T18:51:05Z) - Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.
We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning.
Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z) - Relieving the Plateau: Active Semi-Supervised Learning for a Better
Landscape [2.3046646540823916]
Semi-supervised learning (SSL) leverages unlabeled data that are more accessible than their labeled counterparts.
Active learning (AL) selects unlabeled instances to be annotated by a human-in-the-loop in hopes of better performance with less labeled data.
We propose convergence rate control (CRC), an AL algorithm that selects unlabeled data to improve the problem conditioning upon inclusion to the labeled set.
arXiv Detail & Related papers (2021-04-08T06:03:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.