Tree-Values: selective inference for regression trees
- URL: http://arxiv.org/abs/2106.07816v1
- Date: Tue, 15 Jun 2021 00:25:11 GMT
- Title: Tree-Values: selective inference for regression trees
- Authors: Anna C. Neufeld, Lucy L. Gao, Daniela M. Witten
- Abstract summary: A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees.
We propose a selective inference framework for conducting inference on a fitted CART tree.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider conducting inference on the output of the Classification and
Regression Tree (CART) [Breiman et al., 1984] algorithm. A naive approach to
inference that does not account for the fact that the tree was estimated from
the data will not achieve standard guarantees, such as Type 1 error rate
control and nominal coverage. Thus, we propose a selective inference framework
for conducting inference on a fitted CART tree. In a nutshell, we condition on
the fact that the tree was estimated from the data. We propose a test for the
difference in the mean response between a pair of terminal nodes that controls
the selective Type 1 error rate, and a confidence interval for the mean
response within a single terminal node that attains the nominal selective
coverage. Efficient algorithms for computing the necessary conditioning sets
are provided. We apply these methods in simulation and to a dataset involving
the association between portion control interventions and caloric intake.
Related papers
- Enhanced Outsourced and Secure Inference for Tall Sparse Decision Trees [3.24708405883535]
A decision tree is an easy-to-understand tool that has been widely used for classification tasks.<n>Data owners are keen to reduce risk by outsourcing their model, but want security guarantees that third parties cannot steal their decision tree model.<n>We propose a new decision tree inference protocol in which the model is shared and evaluated among multiple entities.
arXiv Detail & Related papers (2025-05-04T19:15:27Z) - Clustered random forests with correlated data for optimal estimation and inference under potential covariate shift [4.13592995550836]
We develop Clustered Random Forests, a random forests algorithm for clustered data, arising from independent groups that exhibit within-cluster dependence.
The leaf-wise predictions for each decision tree making up clustered random forests takes the form of a weighted least squares estimator.
Clustered random forests are shown for certain tree splitting criteria to be minimax rate optimal for pointwise conditional mean estimation.
arXiv Detail & Related papers (2025-03-16T20:07:23Z) - Decision Trees for Interpretable Clusters in Mixture Models and Deep Representations [5.65604054654671]
We introduce the notion of an explainability-to-noise ratio for mixture models.
We propose an algorithm that takes as input a mixture model and constructs a suitable tree in data-independent time.
We prove upper and lower bounds on the error rate of the resulting decision tree.
arXiv Detail & Related papers (2024-11-03T14:00:20Z) - Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method [76.31185707649227]
We propose a Deep Tree-based Retriever (DTR) for efficient recommendation.
DTR frames the training task as a softmax-based multi-class classification over tree nodes at the same level.
To mitigate the suboptimality induced by the labeling of non-leaf nodes, we propose a rectification method for the loss function.
arXiv Detail & Related papers (2024-08-21T05:09:53Z) - Reinforcement Learning for Node Selection in Branch-and-Bound [52.2648997215667]
Current state-of-the-art selectors utilize either hand-crafted ensembles that automatically switch between naive sub-node selectors, or learned node selectors that rely on individual node data.
We propose a novel simulation technique that uses reinforcement learning (RL) while considering the entire tree state, rather than just isolated nodes.
arXiv Detail & Related papers (2023-09-29T19:55:56Z) - Semi-DETR: Semi-Supervised Object Detection with Detection Transformers [105.45018934087076]
We analyze the DETR-based framework on semi-supervised object detection (SSOD)
We present Semi-DETR, the first transformer-based end-to-end semi-supervised object detector.
Our method outperforms all state-of-the-art methods by clear margins.
arXiv Detail & Related papers (2023-07-16T16:32:14Z) - Prediction Algorithms Achieving Bayesian Decision Theoretical Optimality
Based on Decision Trees as Data Observation Processes [1.2774526936067927]
This paper uses trees to represent data observation processes behind given data.
We derive the statistically optimal prediction, which is robust against overfitting.
We solve this by a Markov chain Monte Carlo method, whose step size is adaptively tuned according to a posterior distribution for the trees.
arXiv Detail & Related papers (2023-06-12T12:14:57Z) - Optimal randomized classification trees [0.0]
Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning.
CARTs are built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and the associated threshold.
This greedy approach trains trees very fast, but, by its nature, their classification accuracy may not be competitive against other state-of-the-art procedures.
arXiv Detail & Related papers (2021-10-19T11:41:12Z) - Visualizing hierarchies in scRNA-seq data using a density tree-biased
autoencoder [50.591267188664666]
We propose an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data.
We then introduce DTAE, a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space.
arXiv Detail & Related papers (2021-02-11T08:48:48Z) - Precision-Recall Curve (PRC) Classification Trees [5.503321733964237]
We propose a novel tree-based algorithm based on the area under the precision-recall curve (AUPRC) for variable selection in the classification context.
Our algorithm, named as the "Precision-Recall Curve classification tree", or simply the "PRC classification tree" modifies two crucial stages in tree building.
arXiv Detail & Related papers (2020-11-15T22:31:06Z) - Convex Polytope Trees [57.56078843831244]
convex polytope trees (CPT) are proposed to expand the family of decision trees by an interpretable generalization of their decision boundary.
We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given.
arXiv Detail & Related papers (2020-10-21T19:38:57Z) - Rectified Decision Trees: Exploring the Landscape of Interpretable and
Effective Machine Learning [66.01622034708319]
We propose a knowledge distillation based decision trees extension, dubbed rectified decision trees (ReDT)
We extend the splitting criteria and the ending condition of the standard decision trees, which allows training with soft labels.
We then train the ReDT based on the soft label distilled from a well-trained teacher model through a novel jackknife-based method.
arXiv Detail & Related papers (2020-08-21T10:45:25Z) - Explainable outlier detection through decision tree conditioning [0.0]
GritBot software works by evaluating and following supervised decision tree splits on variables.
It's possible to produce human-readable explanations for why a given value of a variable in an observation can be considered as outlier.
arXiv Detail & Related papers (2020-01-02T21:45:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.