Transfer Learning with Distance Covariance for Random Forest: Error Bounds and an EHR Application
- URL: http://arxiv.org/abs/2510.10870v1
- Date: Mon, 13 Oct 2025 00:31:56 GMT
- Title: Transfer Learning with Distance Covariance for Random Forest: Error Bounds and an EHR Application
- Authors: Chenze Li, Subhadeep Paul,
- Abstract summary: We propose a method for transfer learning in nonparametric regression using a centered random forest (CRF)<n>In simulations, we show that the results also hold numerically for the standard random forest (SRF) method with data-driven feature split selection.<n>Our method shows significant gains in predicting the mortality of ICU patients in smaller-bed target hospitals.
- Score: 1.6042394978941517
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random forest is an important method for ML applications due to its broad outperformance over competing methods for structured tabular data. We propose a method for transfer learning in nonparametric regression using a centered random forest (CRF) with distance covariance-based feature weights, assuming the unknown source and target regression functions are different for a few features (sparsely different). Our method first obtains residuals from predicting the response in the target domain using a source domain-trained CRF. Then, we fit another CRF to the residuals, but with feature splitting probabilities proportional to the sample distance covariance between the features and the residuals in an independent sample. We derive an upper bound on the mean square error rate of the procedure as a function of sample sizes and difference dimension, theoretically demonstrating transfer learning benefits in random forests. In simulations, we show that the results obtained for the CRFs also hold numerically for the standard random forest (SRF) method with data-driven feature split selection. Beyond transfer learning, our results also show the benefit of distance-covariance-based weights on the performance of RF in some situations. Our method shows significant gains in predicting the mortality of ICU patients in smaller-bed target hospitals using a large multi-hospital dataset of electronic health records for 200,000 ICU patients.
Related papers
- Efficient Covariance Estimation for Sparsified Functional Data [51.69796254617083]
proposed Random-knots (Random-knots-Spatial) and B-spline (Bspline-Spatial) estimators of the covariance function are computationally efficient.<n>Asymptotic pointwise of the covariance are obtained for sparsified individual trajectories under some regularity conditions.
arXiv Detail & Related papers (2025-11-23T00:50:33Z) - Amortized Posterior Sampling with Diffusion Prior Distillation [55.03585818289934]
Amortized Posterior Sampling is a novel variational inference approach for efficient posterior sampling in inverse problems.<n>Our method trains a conditional flow model to minimize the divergence between the variational distribution and the posterior distribution implicitly defined by the diffusion model.<n>Unlike existing methods, our approach is unsupervised, requires no paired training data, and is applicable to both Euclidean and non-Euclidean domains.
arXiv Detail & Related papers (2024-07-25T09:53:12Z) - Individualized Multi-Treatment Response Curves Estimation using RBF-net with Shared Neurons [1.1119247609126184]
Our non-parametric modeling of the response curves relies on radial basis function (RBF)-nets with shared hidden neurons.
Applying our proposed method to MIMIC data, we obtain several interesting findings related to the impact of different treatment strategies on the length of ICU stay and 12-hour SOFA score for sepsis patients who are home-discharged.
arXiv Detail & Related papers (2024-01-29T21:13:01Z) - MMD-based Variable Importance for Distributional Random Forest [5.0459880125089]
We introduce a variable importance algorithm for Distributional Random Forests (DRFs)
We show that the introduced importance measure is consistent, exhibits high empirical performance on both real and simulated data, and outperforms competitors.
arXiv Detail & Related papers (2023-10-18T17:12:29Z) - Score-based Source Separation with Applications to Digital Communication
Signals [72.6570125649502]
We propose a new method for separating superimposed sources using diffusion-based generative models.
Motivated by applications in radio-frequency (RF) systems, we are interested in sources with underlying discrete nature.
Our method can be viewed as a multi-source extension to the recently proposed score distillation sampling scheme.
arXiv Detail & Related papers (2023-06-26T04:12:40Z) - Robust Fiber Orientation Distribution Function Estimation Using Deep Constrained Spherical Deconvolution for Diffusion MRI [9.570365838548073]
A common practice to model the measured DW-MRI signal is via fiber orientation distribution function (fODF)<n> measurement variabilities (e.g., inter- and intra-site variability, hardware performance, and sequence design) are inevitable during the acquisition of DW-MRI.<n>Most existing model-based methods (e.g., constrained spherical deconvolution (CSD)) and learning based methods (e.g., deep learning (DL)) do not explicitly consider such variabilities in fODF modeling.<n>We propose a novel data-driven deep constrained spherical deconvolution method to
arXiv Detail & Related papers (2023-06-05T14:06:40Z) - Covariance regression with random forests [0.0]
CovRegRF is implemented in a freely available R package on CRAN.
An application of the proposed method to thyroid disease data is also presented.
arXiv Detail & Related papers (2022-09-16T21:21:18Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
Efficient [62.24615324523435]
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.
When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient.
arXiv Detail & Related papers (2020-11-08T16:48:02Z) - Addressing Variance Shrinkage in Variational Autoencoders using Quantile
Regression [0.0]
Probable Variational AutoEncoder (VAE) has become a popular model for anomaly detection in applications such as lesion detection in medical images.
We describe an alternative approach that avoids the well-known problem of shrinkage or underestimation of variance.
Using estimated quantiles to compute mean and variance under the Gaussian assumption, we compute reconstruction probability as a principled approach to outlier or anomaly detection.
arXiv Detail & Related papers (2020-10-18T17:37:39Z) - Spatially Adaptive Inference with Stochastic Feature Sampling and
Interpolation [72.40827239394565]
We propose to compute features only at sparsely sampled locations.
We then densely reconstruct the feature map with an efficient procedure.
The presented network is experimentally shown to save substantial computation while maintaining accuracy over a variety of computer vision tasks.
arXiv Detail & Related papers (2020-03-19T15:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.