Exploring Loss Design Techniques For Decision Tree Robustness To Label Noise
- URL: http://arxiv.org/abs/2405.17672v1
- Date: Mon, 27 May 2024 21:49:57 GMT
- Title: Exploring Loss Design Techniques For Decision Tree Robustness To Label Noise
- Authors: Lukasz Sztukiewicz, Jack Henry Good, Artur Dubrawski,
- Abstract summary: We investigate whether ideas from deep learning loss design can be applied to improve the robustness of decision trees.
We show that loss correction and symmetric losses, both standard approaches, are not effective.
- Score: 12.13779291372763
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In the real world, data is often noisy, affecting not only the quality of features but also the accuracy of labels. Current research on mitigating label errors stems primarily from advances in deep learning, and a gap exists in exploring interpretable models, particularly those rooted in decision trees. In this study, we investigate whether ideas from deep learning loss design can be applied to improve the robustness of decision trees. In particular, we show that loss correction and symmetric losses, both standard approaches, are not effective. We argue that other directions need to be explored to improve the robustness of decision trees to label noise.
Related papers
- Robust Loss Functions for Training Decision Trees with Noisy Labels [4.795403008763752]
We consider training decision trees using noisily labeled data, focusing on loss functions that can lead to robust learning algorithms.
First, we offer novel theoretical insights on the robustness of many existing loss functions in the context of decision tree learning.
Second, we introduce a framework for constructing robust loss functions, called distribution losses.
arXiv Detail & Related papers (2023-12-20T11:27:46Z) - ERASE: Error-Resilient Representation Learning on Graphs for Label Noise
Tolerance [53.73316938815873]
We propose a method called ERASE (Error-Resilient representation learning on graphs for lAbel noiSe tolerancE) to learn representations with error tolerance.
ERASE combines prototype pseudo-labels with propagated denoised labels and updates representations with error resilience.
Our method can outperform multiple baselines with clear margins in broad noise levels and enjoy great scalability.
arXiv Detail & Related papers (2023-12-13T17:59:07Z) - Mitigating Label Noise through Data Ambiguation [9.51828574518325]
Large models with high expressive power are prone to memorizing incorrect labels, thereby harming generalization performance.
In this paper, we suggest to address the shortcomings of both methodologies by "ambiguating" the target information.
More precisely, we leverage the framework of so-called superset learning to construct set-valued targets based on a confidence threshold.
arXiv Detail & Related papers (2023-05-23T07:29:08Z) - Prototype-Anchored Learning for Learning with Imperfect Annotations [83.7763875464011]
It is challenging to learn unbiased classification models from imperfectly annotated datasets.
We propose a prototype-anchored learning (PAL) method, which can be easily incorporated into various learning-based classification schemes.
We verify the effectiveness of PAL on class-imbalanced learning and noise-tolerant learning by extensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-06-23T10:25:37Z) - Do We Need to Penalize Variance of Losses for Learning with Label Noise? [91.38888889609002]
We find that the variance should be increased for the problem of learning with noisy labels.
By exploiting the label noise transition matrix, regularizers can be easily designed to reduce the variance of losses.
Empirically, the proposed method by increasing the variance of losses significantly improves the generalization ability of baselines on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-01-30T06:19:08Z) - Sample Selection with Uncertainty of Losses for Learning with Noisy
Labels [145.06552420999986]
In learning with noisy labels, the sample selection approach is very popular, which regards small-loss data as correctly labeled during training.
However, losses are generated on-the-fly based on the model being trained with noisy labels, and thus large-loss data are likely but not certainly to be incorrect.
In this paper, we incorporate the uncertainty of losses by adopting interval estimation instead of point estimation of losses.
arXiv Detail & Related papers (2021-06-01T12:53:53Z) - Rectified Decision Trees: Exploring the Landscape of Interpretable and
Effective Machine Learning [66.01622034708319]
We propose a knowledge distillation based decision trees extension, dubbed rectified decision trees (ReDT)
We extend the splitting criteria and the ending condition of the standard decision trees, which allows training with soft labels.
We then train the ReDT based on the soft label distilled from a well-trained teacher model through a novel jackknife-based method.
arXiv Detail & Related papers (2020-08-21T10:45:25Z) - Which Strategies Matter for Noisy Label Classification? Insight into
Loss and Uncertainty [7.20844895799647]
Label noise is a critical factor that degrades the generalization performance of deep neural networks.
We present analytical results on how loss and uncertainty values of samples change throughout the training process.
We design a new robust training method that emphasizes clean and informative samples, while minimizing the influence of noise.
arXiv Detail & Related papers (2020-08-14T07:34:32Z) - Learning Not to Learn in the Presence of Noisy Labels [104.7655376309784]
We show that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption.
We show that training with this loss function encourages the model to "abstain" from learning on the data points with noisy labels.
arXiv Detail & Related papers (2020-02-16T09:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.