Robust Optimal Classification Trees under Noisy Labels
- URL: http://arxiv.org/abs/2012.08560v1
- Date: Tue, 15 Dec 2020 19:12:29 GMT
- Title: Robust Optimal Classification Trees under Noisy Labels
- Authors: V\'ictor Blanco and Alberto Jap\'on and Justo Puerto
- Abstract summary: We propose a novel methodology to construct Optimal Classification Trees that takes into account that noisy labels may occur in the training sample.
Our approach rests on two main elements: (1) the splitting rules for the classification trees are designed to maximize the separation margin between classes applying the paradigm of SVM; and (2) some of the labels of the training sample are allowed to be changed during the construction of the tree trying to detect the label noise.
- Score: 1.5039745292757671
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we propose a novel methodology to construct Optimal
Classification Trees that takes into account that noisy labels may occur in the
training sample. Our approach rests on two main elements: (1) the splitting
rules for the classification trees are designed to maximize the separation
margin between classes applying the paradigm of SVM; and (2) some of the labels
of the training sample are allowed to be changed during the construction of the
tree trying to detect the label noise. Both features are considered and
integrated together to design the resulting Optimal Classification Tree. We
present a Mixed Integer Non Linear Programming formulation for the problem,
suitable to be solved using any of the available off-the-shelf solvers. The
model is analyzed and tested on a battery of standard datasets taken from UCI
Machine Learning repository, showing the effectiveness of our approach.
Related papers
- Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning [81.83013974171364]
Semi-supervised multi-label learning (SSMLL) is a powerful framework for leveraging unlabeled data to reduce the expensive cost of collecting precise multi-label annotations.
Unlike semi-supervised learning, one cannot select the most probable label as the pseudo-label in SSMLL due to multiple semantics contained in an instance.
We propose a dual-perspective method to generate high-quality pseudo-labels.
arXiv Detail & Related papers (2024-07-26T09:33:53Z) - Classification Tree-based Active Learning: A Wrapper Approach [4.706932040794696]
This paper proposes a wrapper active learning method for classification, organizing the sampling process into a tree structure.
A classification tree constructed on an initial set of labeled samples is considered to decompose the space into low-entropy regions.
This adaptation proves to be a significant enhancement over existing active learning methods.
arXiv Detail & Related papers (2024-04-15T17:27:00Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Margin Optimal Classification Trees [0.0]
We present a novel mixed-integer formulation for the Optimal Classification Tree ( OCT) problem.
Our model, denoted as Margin Optimal Classification Tree (MARGOT), exploits the generalization capabilities of Support Vector Machines for binary classification.
To enhance the interpretability of our approach, we analyse two alternative versions of MARGOT, which include feature selection constraints inducing local sparsity of the hyperplanes.
arXiv Detail & Related papers (2022-10-19T14:08:56Z) - Ensemble Classifier Design Tuned to Dataset Characteristics for Network
Intrusion Detection [0.0]
Two new algorithms are proposed to address the class overlap issue in the dataset.
The proposed design is evaluated for both binary and multi-category classification.
arXiv Detail & Related papers (2022-05-08T21:06:42Z) - Multiclass Optimal Classification Trees with SVM-splits [1.5039745292757671]
We present a novel mathematical optimization-based methodology to construct tree-shaped classification rules for multiclass instances.
Our approach consists of building Classification Trees in which, except for the leaf nodes, the labels are temporarily left out and grouped into two classes by means of a SVM separating hyperplane.
arXiv Detail & Related papers (2021-11-16T18:15:56Z) - Rectified Decision Trees: Exploring the Landscape of Interpretable and
Effective Machine Learning [66.01622034708319]
We propose a knowledge distillation based decision trees extension, dubbed rectified decision trees (ReDT)
We extend the splitting criteria and the ending condition of the standard decision trees, which allows training with soft labels.
We then train the ReDT based on the soft label distilled from a well-trained teacher model through a novel jackknife-based method.
arXiv Detail & Related papers (2020-08-21T10:45:25Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - A Mathematical Programming approach to Binary Supervised Classification
with Label Noise [1.2031796234206138]
We propose novel methodologies to construct Support Vector Machine -based classifiers.
The first method incorporates relabeling directly in the SVM model.
A second family of methods combines clustering with classification at the same time, giving rise to a model that applies simultaneously similarity measures and SVM.
arXiv Detail & Related papers (2020-04-21T17:25:54Z) - Progressive Identification of True Labels for Partial-Label Learning [112.94467491335611]
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label.
Most existing methods elaborately designed as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data.
This paper proposes a novel framework of classifier with flexibility on the model and optimization algorithm.
arXiv Detail & Related papers (2020-02-19T08:35:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.