Beyond Hard Labels: Investigating data label distributions
- URL: http://arxiv.org/abs/2207.06224v1
- Date: Wed, 13 Jul 2022 14:25:30 GMT
- Title: Beyond Hard Labels: Investigating data label distributions
- Authors: Vasco Grossmann, Lars Schmarje, Reinhard Koch
- Abstract summary: We compare the disparity of learning with hard and soft labels for a synthetic and a real-world dataset.
The application of soft labels leads to improved performance and yields a more regular structure of the internal feature space.
- Score: 0.9668407688201357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-quality data is a key aspect of modern machine learning. However, labels
generated by humans suffer from issues like label noise and class ambiguities.
We raise the question of whether hard labels are sufficient to represent the
underlying ground truth distribution in the presence of these inherent
imprecision. Therefore, we compare the disparity of learning with hard and soft
labels quantitatively and qualitatively for a synthetic and a real-world
dataset. We show that the application of soft labels leads to improved
performance and yields a more regular structure of the internal feature space.
Related papers
- You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling [60.27812493442062]
We show the importance of investigating labeled data quality to improve any pseudo-labeling method.
Specifically, we introduce a novel data characterization and selection framework called DIPS to extend pseudo-labeling.
We demonstrate the applicability and impact of DIPS for various pseudo-labeling methods across an extensive range of real-world datasets.
arXiv Detail & Related papers (2024-06-19T17:58:40Z) - Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations [91.67511167969934]
imprecise label learning (ILL) is a framework for the unification of learning with various imprecise label configurations.
We demonstrate that ILL can seamlessly adapt to partial label learning, semi-supervised learning, noisy label learning, and, more importantly, a mixture of these settings.
arXiv Detail & Related papers (2023-05-22T04:50:28Z) - Learning From Biased Soft Labels [48.84637168570285]
A study has demonstrated that knowledge distillation and label smoothing can be unified as learning from soft labels.
This paper studies whether biased soft labels are still effective.
arXiv Detail & Related papers (2023-02-16T08:57:48Z) - Multi-label Classification with High-rank and High-order Label
Correlations [62.39748565407201]
Previous methods capture the high-order label correlations mainly by transforming the label matrix to a latent label space with low-rank matrix factorization.
We propose a simple yet effective method to depict the high-order label correlations explicitly, and at the same time maintain the high-rank of the label matrix.
Comparative studies over twelve benchmark data sets validate the effectiveness of the proposed algorithm in multi-label classification.
arXiv Detail & Related papers (2022-07-09T05:15:31Z) - An Empirical Investigation of Learning from Biased Toxicity Labels [15.822714574671412]
We study how different training strategies can leverage a small dataset of human-annotated labels and a large but noisy dataset of synthetically generated labels.
We evaluate the accuracy and fairness properties of these approaches, and trade-offs between the two.
arXiv Detail & Related papers (2021-10-04T17:19:57Z) - Harmless label noise and informative soft-labels in supervised
classification [1.6752182911522517]
Manual labelling of training examples is common practice in supervised learning.
When the labelling task is of non-trivial difficulty, the supplied labels may not be equal to the ground-truth labels, and label noise is introduced into the training dataset.
In particular, when classification difficulty is the only source of label errors, multiple sets of noisy labels can supply more information for the estimation of a classification rule.
arXiv Detail & Related papers (2021-04-07T02:56:11Z) - Exploiting Context for Robustness to Label Noise in Active Learning [47.341705184013804]
We address the problems of how a system can identify which of the queried labels are wrong and how a multi-class active learning system can be adapted to minimize the negative impact of label noise.
We construct a graphical representation of the unlabeled data to encode these relationships and obtain new beliefs on the graph when noisy labels are available.
This is demonstrated in three different applications: scene classification, activity classification, and document classification.
arXiv Detail & Related papers (2020-10-18T18:59:44Z) - Does label smoothing mitigate label noise? [57.76529645344897]
We show that label smoothing is competitive with loss-correction under label noise.
We show that when distilling models from noisy data, label smoothing of the teacher is beneficial.
arXiv Detail & Related papers (2020-03-05T18:43:17Z) - Limitations of weak labels for embedding and tagging [0.0]
Many datasets and approaches in ambient sound analysis use weakly labeled data.Weak labels are employed because annotating every data sample with a strong label is too expensive.Yet, their impact on the performance in comparison to strong labels remains unclear.
In this paper, we formulate a supervised learning problem which involves weak labels.We create a dataset that focuses on the difference between strong and weak labels as opposed to other challenges.
arXiv Detail & Related papers (2020-02-05T08:54:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.