Precision-Recall Curve (PRC) Classification Trees
- URL: http://arxiv.org/abs/2011.07640v1
- Date: Sun, 15 Nov 2020 22:31:06 GMT
- Title: Precision-Recall Curve (PRC) Classification Trees
- Authors: Jiaju Miao, Wei Zhu
- Abstract summary: We propose a novel tree-based algorithm based on the area under the precision-recall curve (AUPRC) for variable selection in the classification context.
Our algorithm, named as the "Precision-Recall Curve classification tree", or simply the "PRC classification tree" modifies two crucial stages in tree building.
- Score: 5.503321733964237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The classification of imbalanced data has presented a significant challenge
for most well-known classification algorithms that were often designed for data
with relatively balanced class distributions. Nevertheless skewed class
distribution is a common feature in real world problems. It is especially
prevalent in certain application domains with great need for machine learning
and better predictive analysis such as disease diagnosis, fraud detection,
bankruptcy prediction, and suspect identification. In this paper, we propose a
novel tree-based algorithm based on the area under the precision-recall curve
(AUPRC) for variable selection in the classification context. Our algorithm,
named as the "Precision-Recall Curve classification tree", or simply the "PRC
classification tree" modifies two crucial stages in tree building. The first
stage is to maximize the area under the precision-recall curve in node variable
selection. The second stage is to maximize the harmonic mean of recall and
precision (F-measure) for threshold selection. We found the proposed PRC
classification tree, and its subsequent extension, the PRC random forest, work
well especially for class-imbalanced data sets. We have demonstrated that our
methods outperform their classic counterparts, the usual CART and random forest
for both synthetic and real data. Furthermore, the ROC classification tree
proposed by our group previously has shown good performance in imbalanced data.
The combination of them, the PRC-ROC tree, also shows great promise in
identifying the minority class.
Related papers
- Deep Imbalanced Regression via Hierarchical Classification Adjustment [50.19438850112964]
Regression tasks in computer vision are often formulated into classification by quantizing the target space into classes.
The majority of training samples lie in a head range of target values, while a minority of samples span a usually larger tail range.
We propose to construct hierarchical classifiers for solving imbalanced regression tasks.
Our novel hierarchical classification adjustment (HCA) for imbalanced regression shows superior results on three diverse tasks.
arXiv Detail & Related papers (2023-10-26T04:54:39Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - A Mathematical Programming Approach to Optimal Classification Forests [1.0705399532413618]
We propose a novel mathematical optimization-based methodology in which a given number of trees are simultaneously constructed.
The classification rule is derived by assigning to each observation its most frequently predicted class among the trees in the forest.
We show that our proposed method has equal or superior performance compared with state-of-the-art tree-based classification methods.
arXiv Detail & Related papers (2022-11-18T20:33:08Z) - On multivariate randomized classification trees: $l_0$-based sparsity,
VC~dimension and decomposition methods [0.9346127431927981]
We investigate the nonlinear continuous optimization formulation proposed in Blanquero et al.
We first consider alternative methods to sparsify such trees based on concave approximations of the $l_0$ norm"
We propose a general decomposition scheme and an efficient version of it. Experiments on larger datasets show that the proposed decomposition method is able to significantly reduce the training times without compromising the accuracy.
arXiv Detail & Related papers (2021-12-09T22:49:08Z) - Optimal randomized classification trees [0.0]
Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning.
CARTs are built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and the associated threshold.
This greedy approach trains trees very fast, but, by its nature, their classification accuracy may not be competitive against other state-of-the-art procedures.
arXiv Detail & Related papers (2021-10-19T11:41:12Z) - Data-driven advice for interpreting local and global model predictions
in bioinformatics problems [17.685881417954782]
Conditional feature contributions (CFCs) provide textitlocal, case-by-case explanations of a prediction.
We compare the explanations computed by both methods on a set of 164 publicly available classification problems.
For random forests, we find extremely high similarities and correlations of both local and global SHAP values and CFC scores.
arXiv Detail & Related papers (2021-08-13T12:41:39Z) - Making CNNs Interpretable by Building Dynamic Sequential Decision
Forests with Top-down Hierarchy Learning [62.82046926149371]
We propose a generic model transfer scheme to make Convlutional Neural Networks (CNNs) interpretable.
We achieve this by building a differentiable decision forest on top of CNNs.
We name the transferred model deep Dynamic Sequential Decision Forest (dDSDF)
arXiv Detail & Related papers (2021-06-05T07:41:18Z) - Predicting Classification Accuracy When Adding New Unobserved Classes [8.325327265120283]
We study how a classifier's performance can be used to extrapolate its expected accuracy on a larger, unobserved set of classes.
We formulate a robust neural-network-based algorithm, "CleaneX", which learns to estimate the accuracy of such classifiers on arbitrarily large sets of classes.
arXiv Detail & Related papers (2020-10-28T14:37:25Z) - Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance
Segmentation [75.93960390191262]
We exploit prior knowledge of the relations among object categories to cluster fine-grained classes into coarser parent classes.
We propose a simple yet effective resampling method, NMS Resampling, to re-balance the data distribution.
Our method, termed as Forest R-CNN, can serve as a plug-and-play module being applied to most object recognition models.
arXiv Detail & Related papers (2020-08-13T03:52:37Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier [68.38233199030908]
Long-tail recognition tackles the natural non-uniformly distributed data in realworld scenarios.
While moderns perform well on populated classes, its performance degrades significantly on tail classes.
Deep-RTC is proposed as a new solution to the long-tail problem, combining realism with hierarchical predictions.
arXiv Detail & Related papers (2020-07-20T05:57:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.