Noise tolerance of learning to rank under class-conditional label noise
- URL: http://arxiv.org/abs/2208.02126v1
- Date: Wed, 3 Aug 2022 15:04:48 GMT
- Title: Noise tolerance of learning to rank under class-conditional label noise
- Authors: Dany Haddad
- Abstract summary: We describe a class of noise-tolerant LtR losses for which empirical risk minimization is a consistent procedure.
We also develop noise-tolerant analogs of commonly used loss functions.
- Score: 1.14219428942199
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Often, the data used to train ranking models is subject to label noise. For
example, in web-search, labels created from clickstream data are noisy due to
issues such as insufficient information in item descriptions on the SERP, query
reformulation by the user, and erratic or unexpected user behavior. In
practice, it is difficult to handle label noise without making strong
assumptions about the label generation process. As a result, practitioners
typically train their learning-to-rank (LtR) models directly on this noisy data
without additional consideration of the label noise. Surprisingly, we often see
strong performance from LtR models trained in this way. In this work, we
describe a class of noise-tolerant LtR losses for which empirical risk
minimization is a consistent procedure, even in the context of
class-conditional label noise. We also develop noise-tolerant analogs of
commonly used loss functions. The practical implications of our theoretical
findings are further supported by experimental results.
Related papers
- NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification [7.464154519547575]
Existing research on learning with noisy labels predominantly focuses on synthetic noise patterns.
We constructed a benchmark dataset to better understand label noise in real-world text classification settings.
Our findings reveal that while pre-trained models are resilient to synthetic noise, they struggle against instance-dependent noise.
arXiv Detail & Related papers (2024-07-09T06:18:40Z) - NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition [3.726602636064681]
We present an analysis that shows that real noise is significantly more challenging than simulated noise.
We show that current state-of-the-art models for noise-robust learning fall far short of their theoretically achievable upper bound.
arXiv Detail & Related papers (2024-05-13T10:20:31Z) - Federated Learning with Extremely Noisy Clients via Negative
Distillation [70.13920804879312]
Federated learning (FL) has shown remarkable success in cooperatively training deep models, while struggling with noisy labels.
We propose a novel approach, called negative distillation (FedNed) to leverage models trained on noisy clients.
FedNed first identifies noisy clients and employs rather than discards the noisy clients in a knowledge distillation manner.
arXiv Detail & Related papers (2023-12-20T01:59:48Z) - NoisywikiHow: A Benchmark for Learning with Real-world Noisy Labels in
Natural Language Processing [26.678589684142548]
Large-scale datasets in the real world inevitably involve label noise.
Deep models can gradually overfit noisy labels and thus degrade generalization performance.
To mitigate the effects of label noise, learning with noisy labels (LNL) methods are designed to achieve better generalization performance.
arXiv Detail & Related papers (2023-05-18T05:01:04Z) - Identifying Hard Noise in Long-Tailed Sample Distribution [76.16113794808001]
We introduce Noisy Long-Tailed Classification (NLT)
Most de-noising methods fail to identify the hard noises.
We design an iterative noisy learning framework called Hard-to-Easy (H2E)
arXiv Detail & Related papers (2022-07-27T09:03:03Z) - Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in
Text Classification [23.554544399110508]
Wrong labels in training data occur when human annotators make mistakes or when the data is generated via weak or distant supervision.
It has been shown that complex noise-handling techniques are required to prevent models from fitting this label noise.
We show in this work that, for text classification tasks with modern NLP models like BERT, over a variety of noise types, existing noisehandling methods do not always improve its performance, and may even deteriorate it.
arXiv Detail & Related papers (2022-04-20T10:24:19Z) - Learning with Noisy Labels Revisited: A Study Using Real-World Human
Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise.
This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N)
We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z) - Instance-dependent Label-noise Learning under a Structural Causal Model [92.76400590283448]
Label noise will degenerate the performance of deep learning algorithms.
By leveraging a structural causal model, we propose a novel generative approach for instance-dependent label-noise learning.
arXiv Detail & Related papers (2021-09-07T10:42:54Z) - Analysing the Noise Model Error for Realistic Noisy Label Data [14.766574408868806]
We study the quality of estimated noise models from the theoretical side by deriving the expected error of the noise model.
We also publish NoisyNER, a new noisy label dataset from the NLP domain.
arXiv Detail & Related papers (2021-01-24T17:45:15Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.