Revisiting Self-Training with Regularized Pseudo-Labeling for Tabular
Data
- URL: http://arxiv.org/abs/2302.14013v2
- Date: Tue, 28 Feb 2023 03:16:38 GMT
- Title: Revisiting Self-Training with Regularized Pseudo-Labeling for Tabular
Data
- Authors: Minwook Kim, Juseong Kim, Jose Bento, Giltae Song
- Abstract summary: We revisit self-training which can be applied to any kind of algorithm including gradient boosting decision tree.
We propose a novel pseudo-labeling approach that regularizes the confidence scores based on the likelihoods of the pseudo-labels.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent progress in semi- and self-supervised learning has caused a rift in
the long-held belief about the need for an enormous amount of labeled data for
machine learning and the irrelevancy of unlabeled data. Although it has been
successful in various data, there is no dominant semi- and self-supervised
learning method that can be generalized for tabular data (i.e. most of the
existing methods require appropriate tabular datasets and architectures). In
this paper, we revisit self-training which can be applied to any kind of
algorithm including the most widely used architecture, gradient boosting
decision tree, and introduce curriculum pseudo-labeling (a state-of-the-art
pseudo-labeling technique in image) for a tabular domain. Furthermore, existing
pseudo-labeling techniques do not assure the cluster assumption when computing
confidence scores of pseudo-labels generated from unlabeled data. To overcome
this issue, we propose a novel pseudo-labeling approach that regularizes the
confidence scores based on the likelihoods of the pseudo-labels so that more
reliable pseudo-labels which lie in high density regions can be obtained. We
exhaustively validate the superiority of our approaches using various models
and tabular datasets.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.