Related papers: Limitations of weak labels for embedding and tagging

Limitations of weak labels for embedding and tagging

URL: http://arxiv.org/abs/2002.01687v4
Date: Mon, 7 Dec 2020 13:13:51 GMT
Title: Limitations of weak labels for embedding and tagging
Authors: Nicolas Turpault (MULTISPEECH), Romain Serizel (MULTISPEECH), Emmanuel Vincent (MULTISPEECH)
Abstract summary: Many datasets and approaches in ambient sound analysis use weakly labeled data.Weak labels are employed because annotating every data sample with a strong label is too expensive.Yet, their impact on the performance in comparison to strong labels remains unclear. In this paper, we formulate a supervised learning problem which involves weak labels.We create a dataset that focuses on the difference between strong and weak labels as opposed to other challenges.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many datasets and approaches in ambient sound analysis use weakly labeled data.Weak labels are employed because annotating every data sample with a strong label is too expensive.Yet, their impact on the performance in comparison to strong labels remains unclear.Indeed, weak labels must often be dealt with at the same time as other challenges, namely multiple labels per sample, unbalanced classes and/or overlapping events.In this paper, we formulate a supervised learning problem which involves weak labels.We create a dataset that focuses on the difference between strong and weak labels as opposed to other challenges. We investigate the impact of weak labels when training an embedding or an end-to-end classifier.Different experimental scenarios are discussed to provide insights into which applications are most sensitive to weakly labeled data.

Related papers

Mixed Blessing: Class-Wise Embedding guided Instance-Dependent Partial Label Learning [53.64180787439527]
In partial label learning (PLL), every sample is associated with a candidate label set comprising the ground-truth label and several noisy labels. For the first time, we create class-wise embeddings for each sample, which allow us to explore the relationship of instance-dependent noisy labels. To reduce the high label ambiguity, we introduce the concept of class prototypes containing global feature information.
arXiv Detail & Related papers (2024-12-06T13:25:39Z)
Learning from Concealed Labels [5.235218636685312]
We propose a novel setting to protect privacy of each instance, namely learning from concealed labels for multi-class classification. Concealed labels prevent sensitive labels from appearing in the label set during the label collection stage, which specifies none and some random sampled insensitive labels as concealed labels set to annotate sensitive data.
arXiv Detail & Related papers (2024-12-03T08:00:19Z)
You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling [60.27812493442062]
We show the importance of investigating labeled data quality to improve any pseudo-labeling method. Specifically, we introduce a novel data characterization and selection framework called DIPS to extend pseudo-labeling. We demonstrate the applicability and impact of DIPS for various pseudo-labeling methods across an extensive range of real-world datasets.
arXiv Detail & Related papers (2024-06-19T17:58:40Z)
Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels [7.396461226948109]
We address the limitations of the common data annotation and training methods for objective single-label classification tasks. Our findings indicate that additional annotator information, such as confidence, secondary label and disagreement, can be used to effectively generate soft labels.
arXiv Detail & Related papers (2023-11-09T10:47:39Z)
Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training. We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data. Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z)
Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations [91.67511167969934]
imprecise label learning (ILL) is a framework for the unification of learning with various imprecise label configurations. We demonstrate that ILL can seamlessly adapt to partial label learning, semi-supervised learning, noisy label learning, and, more importantly, a mixture of these settings.
arXiv Detail & Related papers (2023-05-22T04:50:28Z)
An Effective Approach for Multi-label Classification with Missing Labels [8.470008570115146]
We propose a pseudo-label based approach to reduce the cost of annotation without bringing additional complexity to the classification networks. By designing a novel loss function, we are able to relax the requirement that each instance must contain at least one positive label. We show that our method can handle the imbalance between positive labels and negative labels, while still outperforming existing missing-label learning approaches.
arXiv Detail & Related papers (2022-10-24T23:13:57Z)
Beyond Hard Labels: Investigating data label distributions [0.9668407688201357]
We compare the disparity of learning with hard and soft labels for a synthetic and a real-world dataset. The application of soft labels leads to improved performance and yields a more regular structure of the internal feature space.
arXiv Detail & Related papers (2022-07-13T14:25:30Z)
Label Noise-Resistant Mean Teaching for Weakly Supervised Fake News Detection [93.6222609806278]
We propose a novel label noise-resistant mean teaching approach (LNMT) for weakly supervised fake news detection. LNMT leverages unlabeled news and feedback comments of users to enlarge the amount of training data. LNMT establishes a mean teacher framework equipped with label propagation and label reliability estimation.
arXiv Detail & Related papers (2022-06-10T16:01:58Z)
Data Consistency for Weakly Supervised Learning [15.365232702938677]
Training machine learning models involves using large amounts of human-annotated data. We propose a novel weak supervision algorithm that processes noisy labels, i.e., weak signals. We show that it significantly outperforms state-of-the-art weak supervision methods on both text and image classification tasks.
arXiv Detail & Related papers (2022-02-08T16:48:19Z)
Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels. We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z)
Harmless label noise and informative soft-labels in supervised classification [1.6752182911522517]
Manual labelling of training examples is common practice in supervised learning. When the labelling task is of non-trivial difficulty, the supplied labels may not be equal to the ground-truth labels, and label noise is introduced into the training dataset. In particular, when classification difficulty is the only source of label errors, multiple sets of noisy labels can supply more information for the estimation of a classification rule.
arXiv Detail & Related papers (2021-04-07T02:56:11Z)
A Study on the Autoregressive and non-Autoregressive Multi-label Learning [77.11075863067131]
We propose a self-attention based variational encoder-model to extract the label-label and label-feature dependencies jointly. Our model can therefore be used to predict all labels in parallel while still including both label-label and label-feature dependencies.
arXiv Detail & Related papers (2020-12-03T05:41:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.