Neighborhood-Regularized Self-Training for Learning with Few Labels
- URL: http://arxiv.org/abs/2301.03726v1
- Date: Tue, 10 Jan 2023 00:07:33 GMT
- Title: Neighborhood-Regularized Self-Training for Learning with Few Labels
- Authors: Ran Xu, Yue Yu, Hejie Cui, Xuan Kan, Yanqiao Zhu, Joyce Ho, Chao
Zhang, Carl Yang
- Abstract summary: One drawback of self-training is that it is vulnerable to the label noise from incorrect pseudo labels.
We develop a neighborhood-based sample selection approach to tackle the issue of noisy pseudo labels.
Our proposed data selection strategy reduces the noise of pseudo labels by 36.8% and saves 57.3% of the time when compared with the best baseline.
- Score: 21.7848889781112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training deep neural networks (DNNs) with limited supervision has been a
popular research topic as it can significantly alleviate the annotation burden.
Self-training has been successfully applied in semi-supervised learning tasks,
but one drawback of self-training is that it is vulnerable to the label noise
from incorrect pseudo labels. Inspired by the fact that samples with similar
labels tend to share similar representations, we develop a neighborhood-based
sample selection approach to tackle the issue of noisy pseudo labels. We
further stabilize self-training via aggregating the predictions from different
rounds during sample selection. Experiments on eight tasks show that our
proposed method outperforms the strongest self-training baseline with 1.83% and
2.51% performance gain for text and graph datasets on average. Our further
analysis demonstrates that our proposed data selection strategy reduces the
noise of pseudo labels by 36.8% and saves 57.3% of the time when compared with
the best baseline. Our code and appendices will be uploaded to
https://github.com/ritaranx/NeST.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - All Points Matter: Entropy-Regularized Distribution Alignment for
Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning.
We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z) - Class adaptive threshold and negative class guided noisy annotation
robust Facial Expression Recognition [3.823356975862006]
noisy annotations are present in datasets inherently because the labeling is subjective to the annotator, clarity of the image, etc.
Recent works use sample selection methods to solve this noisy annotation problem in FER.
In our work, we use a dynamic adaptive threshold to separate confident samples from non-confident ones so that our learning won't be hampered due to non-confident samples.
arXiv Detail & Related papers (2023-05-03T04:28:49Z) - LOPS: Learning Order Inspired Pseudo-Label Selection for Weakly
Supervised Text Classification [28.37907856670151]
Pseudo-labels are noisy due to their nature, so selecting the correct ones has a huge potential for performance boost.
We propose a novel pseudo-label selection method LOPS that memorize takes learning order of samples into consideration.
LOPS can be viewed as a strong performance-boost plug-in to most of existing weakly-supervised text classification methods.
arXiv Detail & Related papers (2022-05-25T06:46:48Z) - UNICON: Combating Label Noise Through Uniform Selection and Contrastive
Learning [89.56465237941013]
We propose UNICON, a simple yet effective sample selection method which is robust to high label noise.
We obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate.
arXiv Detail & Related papers (2022-03-28T07:36:36Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - A new weakly supervised approach for ALS point cloud semantic
segmentation [1.4620086904601473]
We propose a deep-learning based weakly supervised framework for semantic segmentation of ALS point clouds.
We exploit potential information from unlabeled data subject to incomplete and sparse labels.
Our method achieves an overall accuracy of 83.0% and an average F1 score of 70.0%, which have increased by 6.9% and 12.8% respectively.
arXiv Detail & Related papers (2021-10-04T14:00:23Z) - Weakly Supervised Pseudo-Label assisted Learning for ALS Point Cloud
Semantic Segmentation [1.4620086904601473]
Competitive point cloud results usually rely on a large amount of labeled data.
In this study, we propose a pseudo-labeling strategy to obtain accurate results with limited ground truth.
arXiv Detail & Related papers (2021-05-05T08:07:21Z) - Are Fewer Labels Possible for Few-shot Learning? [81.89996465197392]
Few-shot learning is challenging due to its very limited data and labels.
Recent studies in big transfer (BiT) show that few-shot learning can greatly benefit from pretraining on large scale labeled dataset in a different domain.
We propose eigen-finetuning to enable fewer shot learning by leveraging the co-evolution of clustering and eigen-samples in the finetuning.
arXiv Detail & Related papers (2020-12-10T18:59:29Z) - Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck.
We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network.
We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.