Distributional Random Forests: Heterogeneity Adjustment and Multivariate
Distributional Regression
- URL: http://arxiv.org/abs/2005.14458v3
- Date: Wed, 12 Oct 2022 08:34:46 GMT
- Title: Distributional Random Forests: Heterogeneity Adjustment and Multivariate
Distributional Regression
- Authors: Domagoj \'Cevid, Loris Michel, Jeffrey N\"af, Nicolai Meinshausen,
Peter B\"uhlmann
- Abstract summary: We propose a novel forest construction for multivariate responses based on their joint conditional distribution.
The code is available as Python and R packages drf.
- Score: 0.8574682463936005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random Forest (Breiman, 2001) is a successful and widely used regression and
classification algorithm. Part of its appeal and reason for its versatility is
its (implicit) construction of a kernel-type weighting function on training
data, which can also be used for targets other than the original mean
estimation. We propose a novel forest construction for multivariate responses
based on their joint conditional distribution, independent of the estimation
target and the data model. It uses a new splitting criterion based on the MMD
distributional metric, which is suitable for detecting heterogeneity in
multivariate distributions. The induced weights define an estimate of the full
conditional distribution, which in turn can be used for arbitrary and
potentially complicated targets of interest. The method is very versatile and
convenient to use, as we illustrate on a wide range of examples. The code is
available as Python and R packages drf.
Related papers
- Semiparametric conformal prediction [79.6147286161434]
Risk-sensitive applications require well-calibrated prediction sets over multiple, potentially correlated target variables.
We treat the scores as random vectors and aim to construct the prediction set accounting for their joint correlation structure.
We report desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z) - Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.
We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.
We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - Inference with Mondrian Random Forests [6.97762648094816]
We give precise bias and variance characterizations, along with a Berry-Esseen-type central limit theorem, for the Mondrian random forest regression estimator.
We present valid statistical inference methods for the unknown regression function.
Efficient and implementable algorithms are devised for both batch and online learning settings.
arXiv Detail & Related papers (2023-10-15T01:41:42Z) - Variational autoencoder with weighted samples for high-dimensional
non-parametric adaptive importance sampling [0.0]
We extend the existing framework to the case of weighted samples by introducing a new objective function.
In order to add flexibility to the model and to be able to learn multimodal distributions, we consider a learnable prior distribution.
We exploit the proposed procedure in existing adaptive importance sampling algorithms to draw points from a target distribution and to estimate a rare event probability in high dimension.
arXiv Detail & Related papers (2023-10-13T15:40:55Z) - Wrapped Distributions on homogeneous Riemannian manifolds [58.720142291102135]
Control over distributions' properties, such as parameters, symmetry and modality yield a family of flexible distributions.
We empirically validate our approach by utilizing our proposed distributions within a variational autoencoder and a latent space network model.
arXiv Detail & Related papers (2022-04-20T21:25:21Z) - Machine Learning for Multi-Output Regression: When should a holistic
multivariate approach be preferred over separate univariate ones? [62.997667081978825]
Tree-based ensembles such as the Random Forest are modern classics among statistical learning methods.
We compare these methods in extensive simulations to help in answering the primary question when to use multivariate ensemble techniques.
arXiv Detail & Related papers (2022-01-14T08:44:25Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - Probabilistic Kolmogorov-Arnold Network [1.4732811715354455]
The present paper proposes a method for estimating probability distributions of the outputs in the case of aleatoric uncertainty.
The suggested approach covers input-dependent probability distributions of the outputs, as well as the variation of the distribution type with the inputs.
Although the method is applicable to any regression model, the present paper combines it with KANs, since the specific structure of KANs leads to computationally-efficient models' construction.
arXiv Detail & Related papers (2021-04-04T23:49:15Z) - Nonlinear Distribution Regression for Remote Sensing Applications [6.664736150040092]
In many remote sensing applications one wants to estimate variables or parameters of interest from observations.
Standard algorithms such as neural networks, random forests or Gaussian processes are readily available to relate to the two.
This paper introduces a nonlinear (kernel-based) method for distribution regression that solves the previous problems without making any assumption on the statistics of the grouped data.
arXiv Detail & Related papers (2020-12-07T22:04:43Z) - An Embedded Model Estimator for Non-Stationary Random Functions using
Multiple Secondary Variables [0.0]
This paper introduces the method and shows that it has consistency results that are similar in nature to those applying to geostatistical modelling and to Quantile Random Forests.
The algorithm works by estimating a conditional distribution for the target variable at each target location.
arXiv Detail & Related papers (2020-11-09T00:14:24Z) - Decision-Making with Auto-Encoding Variational Bayes [71.44735417472043]
We show that a posterior approximation distinct from the variational distribution should be used for making decisions.
Motivated by these theoretical results, we propose learning several approximate proposals for the best model.
In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing.
arXiv Detail & Related papers (2020-02-17T19:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.