Fr\'echet random forests for metric space valued regression with non
euclidean predictors
- URL: http://arxiv.org/abs/1906.01741v3
- Date: Fri, 16 Feb 2024 12:41:29 GMT
- Title: Fr\'echet random forests for metric space valued regression with non
euclidean predictors
- Authors: Louis Capitaine, J\'er\'emie Bigot, Rodolphe Thi\'ebaut and Robin
Genuer
- Abstract summary: We introduce Fr'echet trees and Fr'echet random forests, which allow to handle data for which input and output variables take values in general metric spaces.
A consistency theorem for Fr'echet regressogram predictor using data-driven partitions is given and applied to Fr'echet purely uniformly random trees.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Random forests are a statistical learning method widely used in many areas of
scientific research because of its ability to learn complex relationships
between input and output variables and also its capacity to handle
high-dimensional data. However, current random forest approaches are not
flexible enough to handle heterogeneous data such as curves, images and shapes.
In this paper, we introduce Fr\'echet trees and Fr\'echet random forests, which
allow to handle data for which input and output variables take values in
general metric spaces. To this end, a new way of splitting the nodes of trees
is introduced and the prediction procedures of trees and forests are
generalized. Then, random forests out-of-bag error and variable importance
score are naturally adapted. A consistency theorem for Fr\'echet regressogram
predictor using data-driven partitions is given and applied to Fr\'echet purely
uniformly random trees. The method is studied through several simulation
scenarios on heterogeneous data combining longitudinal, image and scalar data.
Finally, one real dataset about air quality is used to illustrate the use of
the proposed method in practice.
Related papers
- Ensembles of Probabilistic Regression Trees [46.53457774230618]
Tree-based ensemble methods have been successfully used for regression problems in many applications and research studies.
We study ensemble versions of probabilisticregression trees that provide smooth approximations of the objective function by assigningeach observation to each region with respect to a probability distribution.
arXiv Detail & Related papers (2024-06-20T06:51:51Z) - Why do Random Forests Work? Understanding Tree Ensembles as
Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles.
We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z) - Inference with Mondrian Random Forests [6.97762648094816]
We give precise bias and variance characterizations, along with a Berry-Esseen-type central limit theorem, for the Mondrian random forest regression estimator.
We present valid statistical inference methods for the unknown regression function.
Efficient and implementable algorithms are devised for both batch and online learning settings.
arXiv Detail & Related papers (2023-10-15T01:41:42Z) - Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space.
We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z) - Random Similarity Forests [2.3204178451683264]
We propose a classification method capable of handling datasets with features of arbitrary data types while retaining each feature's characteristic.
The proposed algorithm, called Random Similarity Forest, uses multiple domain-specific distance measures to combine the predictive performance of Random Forests with the flexibility of Similarity Forests.
We show that Random Similarity Forests are on par with Random Forests on numerical data and outperform them on datasets from complex or mixed data domains.
arXiv Detail & Related papers (2022-04-11T20:14:05Z) - Geometry- and Accuracy-Preserving Random Forest Proximities [3.265773263570237]
We introduce a novel definition of random forest proximities called Random Forest-Geometry- and Accuracy-Preserving proximities (RF-GAP)
We prove that RF-GAP exactly match the out-of-bag random forest prediction, thus capturing the data geometry learned by the random forest.
This improved geometric representation outperforms traditional random forest proximities in tasks such as data imputation and provides outlier detection and visualization results consistent with the learned data geometry.
arXiv Detail & Related papers (2022-01-29T23:13:53Z) - MURAL: An Unsupervised Random Forest-Based Embedding for Electronic
Health Record Data [59.26381272149325]
We present an unsupervised random forest for representing data with disparate variable types.
MURAL forests consist of a set of decision trees where node-splitting variables are chosen at random.
We show that using our approach, we can visualize and classify data more accurately than competing approaches.
arXiv Detail & Related papers (2021-11-19T22:02:21Z) - Achieving Reliable Causal Inference with Data-Mined Variables: A Random
Forest Approach to the Measurement Error Problem [1.5749416770494704]
A common empirical strategy involves the application of predictive modeling techniques to'mine' variables of interest from available data.
Recent work highlights that, because the predictions from machine learning models are inevitably imperfect, econometric analyses based on the predicted variables are likely to suffer from bias due to measurement error.
We propose a novel approach to mitigate these biases, leveraging the ensemble learning technique known as the random forest.
arXiv Detail & Related papers (2020-12-19T21:48:23Z) - Handling Missing Data in Decision Trees: A Probabilistic Approach [41.259097100704324]
We tackle the problem of handling missing data in decision trees by taking a probabilistic approach.
We use tractable density estimators to compute the "expected prediction" of our models.
At learning time, we fine-tune parameters of already learned trees by minimizing their "expected prediction loss"
arXiv Detail & Related papers (2020-06-29T19:54:54Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z) - A Numerical Transform of Random Forest Regressors corrects
Systematically-Biased Predictions [0.0]
We find a systematic bias in predictions from random forest models.
This bias is recapitulated in simple synthetic datasets.
We use the training data to define a numerical transformation that fully corrects it.
arXiv Detail & Related papers (2020-03-16T21:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.