A Nonparametric Test of Dependence Based on Ensemble of Decision Trees
- URL: http://arxiv.org/abs/2007.12325v1
- Date: Fri, 24 Jul 2020 02:48:33 GMT
- Title: A Nonparametric Test of Dependence Based on Ensemble of Decision Trees
- Authors: Rami Mahdi
- Abstract summary: The proposed coefficient is a permutation-like statistic that quantifies how much the observed sample S_n : (X_i, Y_i), i = 1.
n is discriminable from the permutated sample S_nn : (X_i, Y_j), i, j = 1.
n, where the two variables are independent.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, a robust non-parametric measure of statistical dependence, or
correlation, between two random variables is presented. The proposed
coefficient is a permutation-like statistic that quantifies how much the
observed sample S_n : {(X_i , Y_i), i = 1 . . . n} is discriminable from the
permutated sample ^S_nn : {(X_i , Y_j), i, j = 1 . . . n}, where the two
variables are independent. The extent of discriminability is determined using
the predictions for the, interchangeable, leave-out sample from training an
aggregate of decision trees to discriminate between the two samples without
materializing the permutated sample. The proposed coefficient is
computationally efficient, interpretable, invariant to monotonic
transformations, and has a well-approximated distribution under independence.
Empirical results show the proposed method to have a high power for detecting
complex relationships from noisy data.
Related papers
- Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems.
We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Detecting Adversarial Data by Probing Multiple Perturbations Using
Expected Perturbation Score [62.54911162109439]
Adversarial detection aims to determine whether a given sample is an adversarial one based on the discrepancy between natural and adversarial distributions.
We propose a new statistic called expected perturbation score (EPS), which is essentially the expected score of a sample after various perturbations.
We develop EPS-based maximum mean discrepancy (MMD) as a metric to measure the discrepancy between the test sample and natural samples.
arXiv Detail & Related papers (2023-05-25T13:14:58Z) - Bootstrapped Edge Count Tests for Nonparametric Two-Sample Inference
Under Heterogeneity [5.8010446129208155]
We develop a new nonparametric testing procedure that accurately detects differences between the two samples.
A comprehensive simulation study and an application to detecting user behaviors in online games demonstrates the excellent non-asymptotic performance of the proposed test.
arXiv Detail & Related papers (2023-04-26T22:25:44Z) - Data thinning for convolution-closed distributions [2.299914829977005]
We propose data thinning, an approach for splitting an observation into two or more independent parts that sum to the original observation.
We show that data thinning can be used to validate the results of unsupervised learning approaches.
arXiv Detail & Related papers (2023-01-18T02:47:41Z) - Nonparametric Conditional Local Independence Testing [69.31200003384122]
Conditional local independence is an independence relation among continuous time processes.
No nonparametric test of conditional local independence has been available.
We propose such a nonparametric test based on double machine learning.
arXiv Detail & Related papers (2022-03-25T10:31:02Z) - Comparing two samples through stochastic dominance: a graphical approach [2.867517731896504]
Non-deterministic measurements are common in real-world scenarios.
We propose an alternative framework to visually compare two samples according to their estimated cumulative distribution functions.
arXiv Detail & Related papers (2022-03-15T13:37:03Z) - The Representation Jensen-Reny\'i Divergence [0.0]
We introduce a measure between data distributions based on operators in reproducing kernel Hilbert spaces defined by infinitely divisible kernels.
The proposed measure of divergence avoids the estimation of the probability distribution underlying the data.
arXiv Detail & Related papers (2021-12-02T19:51:52Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z) - Posterior Ratio Estimation of Latent Variables [14.619879849533662]
In some applications, we want to compare distributions of random variables that are emphinferred from observations.
We study the problem of estimating the ratio between two posterior probability density functions of a latent variable.
arXiv Detail & Related papers (2020-02-15T16:46:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.