Gray Learning from Non-IID Data with Out-of-distribution Samples
- URL: http://arxiv.org/abs/2206.09375v2
- Date: Sat, 4 Nov 2023 07:21:45 GMT
- Title: Gray Learning from Non-IID Data with Out-of-distribution Samples
- Authors: Zhilin Zhao and Longbing Cao and Chang-Dong Wang
- Abstract summary: The integrity of training data, even when annotated by experts, is far from guaranteed.
We introduce a novel approach, termed textitGray Learning, which leverages both ground-truth and complementary labels.
By grounding our approach in statistical learning theory, we derive bounds for the generalization error, demonstrating that GL achieves tight constraints even in non-IID settings.
- Score: 45.788789553551176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integrity of training data, even when annotated by experts, is far from
guaranteed, especially for non-IID datasets comprising both in- and
out-of-distribution samples. In an ideal scenario, the majority of samples
would be in-distribution, while samples that deviate semantically would be
identified as out-of-distribution and excluded during the annotation process.
However, experts may erroneously classify these out-of-distribution samples as
in-distribution, assigning them labels that are inherently unreliable. This
mixture of unreliable labels and varied data types makes the task of learning
robust neural networks notably challenging. We observe that both in- and
out-of-distribution samples can almost invariably be ruled out from belonging
to certain classes, aside from those corresponding to unreliable ground-truth
labels. This opens the possibility of utilizing reliable complementary labels
that indicate the classes to which a sample does not belong. Guided by this
insight, we introduce a novel approach, termed \textit{Gray Learning} (GL),
which leverages both ground-truth and complementary labels. Crucially, GL
adaptively adjusts the loss weights for these two label types based on
prediction confidence levels. By grounding our approach in statistical learning
theory, we derive bounds for the generalization error, demonstrating that GL
achieves tight constraints even in non-IID settings. Extensive experimental
evaluations reveal that our method significantly outperforms alternative
approaches grounded in robust statistics.
Related papers
- Self-Knowledge Distillation for Learning Ambiguity [11.755814660833549]
Recent language models often over-confidently predict a single label without consideration for its correctness.
We propose a novel self-knowledge distillation method that enables models to learn label distributions more accurately.
We validate our method on diverse NLU benchmark datasets and the experimental results demonstrate its effectiveness in producing better label distributions.
arXiv Detail & Related papers (2024-06-14T05:11:32Z) - Virtual Category Learning: A Semi-Supervised Learning Method for Dense
Prediction with Extremely Limited Labels [63.16824565919966]
This paper proposes to use confusing samples proactively without label correction.
A Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation.
Our intriguing findings highlight the usage of VC learning in dense vision tasks.
arXiv Detail & Related papers (2023-12-02T16:23:52Z) - Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical [66.57396042747706]
Complementary-label learning is a weakly supervised learning problem.
We propose a consistent approach that does not rely on the uniform distribution assumption.
We find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems.
arXiv Detail & Related papers (2023-11-27T02:59:17Z) - The Decaying Missing-at-Random Framework: Doubly Robust Causal Inference
with Partially Labeled Data [10.021381302215062]
In real-world scenarios, data collection limitations often result in partially labeled datasets, leading to difficulties in drawing reliable causal inferences.
Traditional approaches in the semi-parametric (SS) and missing data literature may not adequately handle these complexities, leading to biased estimates.
This framework tackles missing outcomes in high-dimensional settings and accounts for selection bias.
arXiv Detail & Related papers (2023-05-22T07:37:12Z) - Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning [69.81438976273866]
Open-set semi-supervised learning (Open-set SSL) considers a more practical scenario, where unlabeled data and test data contain new categories (outliers) not observed in labeled data (inliers)
We introduce evidential deep learning (EDL) as an outlier detector to quantify different types of uncertainty, and design different uncertainty metrics for self-training and inference.
We propose a novel adaptive negative optimization strategy, making EDL more tailored to the unlabeled dataset containing both inliers and outliers.
arXiv Detail & Related papers (2023-03-21T09:07:15Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Exploiting Sample Uncertainty for Domain Adaptive Person
Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels.
Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.