Revisiting Self-Training with Regularized Pseudo-Labeling for Tabular
Data
- URL: http://arxiv.org/abs/2302.14013v2
- Date: Tue, 28 Feb 2023 03:16:38 GMT
- Title: Revisiting Self-Training with Regularized Pseudo-Labeling for Tabular
Data
- Authors: Minwook Kim, Juseong Kim, Jose Bento, Giltae Song
- Abstract summary: We revisit self-training which can be applied to any kind of algorithm including gradient boosting decision tree.
We propose a novel pseudo-labeling approach that regularizes the confidence scores based on the likelihoods of the pseudo-labels.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent progress in semi- and self-supervised learning has caused a rift in
the long-held belief about the need for an enormous amount of labeled data for
machine learning and the irrelevancy of unlabeled data. Although it has been
successful in various data, there is no dominant semi- and self-supervised
learning method that can be generalized for tabular data (i.e. most of the
existing methods require appropriate tabular datasets and architectures). In
this paper, we revisit self-training which can be applied to any kind of
algorithm including the most widely used architecture, gradient boosting
decision tree, and introduce curriculum pseudo-labeling (a state-of-the-art
pseudo-labeling technique in image) for a tabular domain. Furthermore, existing
pseudo-labeling techniques do not assure the cluster assumption when computing
confidence scores of pseudo-labels generated from unlabeled data. To overcome
this issue, we propose a novel pseudo-labeling approach that regularizes the
confidence scores based on the likelihoods of the pseudo-labels so that more
reliable pseudo-labels which lie in high density regions can be obtained. We
exhaustively validate the superiority of our approaches using various models
and tabular datasets.
Related papers
- You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling [60.27812493442062]
We show the importance of investigating labeled data quality to improve any pseudo-labeling method.
Specifically, we introduce a novel data characterization and selection framework called DIPS to extend pseudo-labeling.
We demonstrate the applicability and impact of DIPS for various pseudo-labeling methods across an extensive range of real-world datasets.
arXiv Detail & Related papers (2024-06-19T17:58:40Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Doubly Robust Self-Training [46.168395767948965]
We introduce doubly robust self-training, a novel semi-supervised algorithm.
We demonstrate the superiority of the doubly robust loss over the standard self-training baseline.
arXiv Detail & Related papers (2023-06-01T00:57:16Z) - Why pseudo label based algorithm is effective? --from the perspective of
pseudo labeled data [1.8402019107354282]
We give a theory analysis for why pseudo label based semi-supervised learning is effective in this paper.
Our analysis shows that, firstly, when the amount of unlabeled data tends to infinity, the pseudo label based semi-supervised learning algorithm can obtain model which have the same generalization error upper bound as model obtained by normally training.
More importantly, we prove that when the amount of unlabeled data is large enough, the generalization error upper bound of the model obtained by pseudo label based semi-supervised learning algorithm can converge to the optimal upper bound with linear convergence rate.
arXiv Detail & Related papers (2022-11-18T05:34:37Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - GuidedMix-Net: Learning to Improve Pseudo Masks Using Labeled Images as
Reference [153.354332374204]
We propose a novel method for semi-supervised semantic segmentation named GuidedMix-Net.
We first introduce a feature alignment objective between labeled and unlabeled data to capture potentially similar image pairs.
MITrans is shown to be a powerful knowledge module for further progressive refining features of unlabeled data.
Along with supervised learning for labeled data, the prediction of unlabeled data is jointly learned with the generated pseudo masks.
arXiv Detail & Related papers (2021-06-29T02:48:45Z) - A Novel Perspective for Positive-Unlabeled Learning via Noisy Labels [49.990938653249415]
This research presents a methodology that assigns initial pseudo-labels to unlabeled data which is used as noisy-labeled data, and trains a deep neural network using the noisy-labeled data.
Experimental results demonstrate that the proposed method significantly outperforms the state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-03-08T11:46:02Z) - Self-semi-supervised Learning to Learn from NoisyLabeled Data [3.18577806302116]
It is costly to obtain high-quality human-labeled data, leading to the active research area of training models robust to noisy labels.
In this project, we designed methods to more accurately differentiate clean and noisy labels and borrowed the wisdom of self-semi-supervised learning to train noisy labeled data.
arXiv Detail & Related papers (2020-11-03T02:31:29Z) - Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier.
An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.