A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data
Streams
- URL: http://arxiv.org/abs/2106.09170v1
- Date: Wed, 16 Jun 2021 23:14:20 GMT
- Title: A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data
Streams
- Authors: Heitor Murilo Gomes, Maciej Grzenda, Rodrigo Mello, Jesse Read, Minh
Huong Le Nguyen, Albert Bifet
- Abstract summary: This survey pays special attention to methods that leverage unlabelled data in a semi-supervised setting.
We discuss the delayed labelling issue, which impacts both fully supervised and semi-supervised methods.
- Score: 10.370629574634092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unlabelled data appear in many domains and are particularly relevant to
streaming applications, where even though data is abundant, labelled data is
rare. To address the learning problems associated with such data, one can
ignore the unlabelled data and focus only on the labelled data (supervised
learning); use the labelled data and attempt to leverage the unlabelled data
(semi-supervised learning); or assume some labels will be available on request
(active learning). The first approach is the simplest, yet the amount of
labelled data available will limit the predictive performance. The second
relies on finding and exploiting the underlying characteristics of the data
distribution. The third depends on an external agent to provide the required
labels in a timely fashion. This survey pays special attention to methods that
leverage unlabelled data in a semi-supervised setting. We also discuss the
delayed labelling issue, which impacts both fully supervised and
semi-supervised methods. We propose a unified problem setting, discuss the
learning guarantees and existing methods, explain the differences between
related problem settings. Finally, we review the current benchmarking practices
and propose adaptations to enhance them.
Related papers
- Deep Active Learning with Manifold-preserving Trajectory Sampling [2.0717982775472206]
Active learning (AL) is for optimizing the selection of unlabeled data for annotation (labeling)
Existing deep AL methods arguably suffer from bias incurred by clabeled data, which takes a much lower percentage than unlabeled data in AL context.
We propose a novel method, namely Manifold-Preserving Trajectory Sampling (MPTS), aiming to enforce the feature space learned from labeled data to represent a more accurate manifold.
arXiv Detail & Related papers (2024-10-21T03:04:09Z) - FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data.
Most SSL methods are commonly based on instance-wise consistency between different data transformations.
We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Are labels informative in semi-supervised learning? -- Estimating and
leveraging the missing-data mechanism [4.675583319625962]
Semi-supervised learning is a powerful technique for leveraging unlabeled data to improve machine learning models.
It can be affected by the presence of informative'' labels, which occur when some classes are more likely to be labeled than others.
We propose a novel approach to address this issue by estimating the missing-data mechanism and using inverse propensity weighting to debias any SSL algorithm.
arXiv Detail & Related papers (2023-02-15T09:18:46Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Data Consistency for Weakly Supervised Learning [15.365232702938677]
Training machine learning models involves using large amounts of human-annotated data.
We propose a novel weak supervision algorithm that processes noisy labels, i.e., weak signals.
We show that it significantly outperforms state-of-the-art weak supervision methods on both text and image classification tasks.
arXiv Detail & Related papers (2022-02-08T16:48:19Z) - OpenCoS: Contrastive Semi-supervised Learning for Handling Open-set
Unlabeled Data [65.19205979542305]
Unlabeled data may include out-of-class samples in practice.
OpenCoS is a method for handling this realistic semi-supervised learning scenario.
arXiv Detail & Related papers (2021-06-29T06:10:05Z) - Instance Correction for Learning with Open-set Noisy Labels [145.06552420999986]
We use the sample selection approach to handle open-set noisy labels.
The discarded data are seen to be mislabeled and do not participate in training.
We modify the instances of discarded data to make predictions for the discarded data consistent with given labels.
arXiv Detail & Related papers (2021-06-01T13:05:55Z) - A Novel Perspective for Positive-Unlabeled Learning via Noisy Labels [49.990938653249415]
This research presents a methodology that assigns initial pseudo-labels to unlabeled data which is used as noisy-labeled data, and trains a deep neural network using the noisy-labeled data.
Experimental results demonstrate that the proposed method significantly outperforms the state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-03-08T11:46:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.