Limitations of weak labels for embedding and tagging
- URL: http://arxiv.org/abs/2002.01687v4
- Date: Mon, 7 Dec 2020 13:13:51 GMT
- Title: Limitations of weak labels for embedding and tagging
- Authors: Nicolas Turpault (MULTISPEECH), Romain Serizel (MULTISPEECH), Emmanuel
Vincent (MULTISPEECH)
- Abstract summary: Many datasets and approaches in ambient sound analysis use weakly labeled data.Weak labels are employed because annotating every data sample with a strong label is too expensive.Yet, their impact on the performance in comparison to strong labels remains unclear.
In this paper, we formulate a supervised learning problem which involves weak labels.We create a dataset that focuses on the difference between strong and weak labels as opposed to other challenges.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many datasets and approaches in ambient sound analysis use weakly labeled
data.Weak labels are employed because annotating every data sample with a
strong label is too expensive.Yet, their impact on the performance in
comparison to strong labels remains unclear.Indeed, weak labels must often be
dealt with at the same time as other challenges, namely multiple labels per
sample, unbalanced classes and/or overlapping events.In this paper, we
formulate a supervised learning problem which involves weak labels.We create a
dataset that focuses on the difference between strong and weak labels as
opposed to other challenges. We investigate the impact of weak labels when
training an embedding or an end-to-end classifier.Different experimental
scenarios are discussed to provide insights into which applications are most
sensitive to weakly labeled data.
Related papers
- Mixed Blessing: Class-Wise Embedding guided Instance-Dependent Partial Label Learning [53.64180787439527]
In partial label learning (PLL), every sample is associated with a candidate label set comprising the ground-truth label and several noisy labels.
For the first time, we create class-wise embeddings for each sample, which allow us to explore the relationship of instance-dependent noisy labels.
To reduce the high label ambiguity, we introduce the concept of class prototypes containing global feature information.
arXiv Detail & Related papers (2024-12-06T13:25:39Z) - Learning from Concealed Labels [5.235218636685312]
We propose a novel setting to protect privacy of each instance, namely learning from concealed labels for multi-class classification.
Concealed labels prevent sensitive labels from appearing in the label set during the label collection stage, which specifies none and some random sampled insensitive labels as concealed labels set to annotate sensitive data.
arXiv Detail & Related papers (2024-12-03T08:00:19Z) - You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling [60.27812493442062]
We show the importance of investigating labeled data quality to improve any pseudo-labeling method.
Specifically, we introduce a novel data characterization and selection framework called DIPS to extend pseudo-labeling.
We demonstrate the applicability and impact of DIPS for various pseudo-labeling methods across an extensive range of real-world datasets.
arXiv Detail & Related papers (2024-06-19T17:58:40Z) - Don't Waste a Single Annotation: Improving Single-Label Classifiers
Through Soft Labels [7.396461226948109]
We address the limitations of the common data annotation and training methods for objective single-label classification tasks.
Our findings indicate that additional annotator information, such as confidence, secondary label and disagreement, can be used to effectively generate soft labels.
arXiv Detail & Related papers (2023-11-09T10:47:39Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations [91.67511167969934]
imprecise label learning (ILL) is a framework for the unification of learning with various imprecise label configurations.
We demonstrate that ILL can seamlessly adapt to partial label learning, semi-supervised learning, noisy label learning, and, more importantly, a mixture of these settings.
arXiv Detail & Related papers (2023-05-22T04:50:28Z) - An Effective Approach for Multi-label Classification with Missing Labels [8.470008570115146]
We propose a pseudo-label based approach to reduce the cost of annotation without bringing additional complexity to the classification networks.
By designing a novel loss function, we are able to relax the requirement that each instance must contain at least one positive label.
We show that our method can handle the imbalance between positive labels and negative labels, while still outperforming existing missing-label learning approaches.
arXiv Detail & Related papers (2022-10-24T23:13:57Z) - Beyond Hard Labels: Investigating data label distributions [0.9668407688201357]
We compare the disparity of learning with hard and soft labels for a synthetic and a real-world dataset.
The application of soft labels leads to improved performance and yields a more regular structure of the internal feature space.
arXiv Detail & Related papers (2022-07-13T14:25:30Z) - Label Noise-Resistant Mean Teaching for Weakly Supervised Fake News
Detection [93.6222609806278]
We propose a novel label noise-resistant mean teaching approach (LNMT) for weakly supervised fake news detection.
LNMT leverages unlabeled news and feedback comments of users to enlarge the amount of training data.
LNMT establishes a mean teacher framework equipped with label propagation and label reliability estimation.
arXiv Detail & Related papers (2022-06-10T16:01:58Z) - Data Consistency for Weakly Supervised Learning [15.365232702938677]
Training machine learning models involves using large amounts of human-annotated data.
We propose a novel weak supervision algorithm that processes noisy labels, i.e., weak signals.
We show that it significantly outperforms state-of-the-art weak supervision methods on both text and image classification tasks.
arXiv Detail & Related papers (2022-02-08T16:48:19Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.