The BP Dependency Function: a Generic Measure of Dependence between
Random Variables
- URL: http://arxiv.org/abs/2203.12329v1
- Date: Wed, 23 Mar 2022 11:14:40 GMT
- Title: The BP Dependency Function: a Generic Measure of Dependence between
Random Variables
- Authors: Guus Berkelmans, Joris Pries, Sandjai Bhulai and Rob van der Mei
- Abstract summary: Measuring and quantifying dependencies between random variables (RV's) can give critical insights into a data-set.
Common practice of data analysis is that most data analysts use the Pearson correlation coefficient (PCC) to quantify dependence between RV's.
We propose a new dependency function that meets all these requirements.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Measuring and quantifying dependencies between random variables (RV's) can
give critical insights into a data-set. Typical questions are: `Do underlying
relationships exist?', `Are some variables redundant?', and `Is some target
variable $Y$ highly or weakly dependent on variable $X$?' Interestingly,
despite the evident need for a general-purpose measure of dependency between
RV's, common practice of data analysis is that most data analysts use the
Pearson correlation coefficient (PCC) to quantify dependence between RV's,
while it is well-recognized that the PCC is essentially a measure for linear
dependency only. Although many attempts have been made to define more generic
dependency measures, there is yet no consensus on a standard, general-purpose
dependency function. In fact, several ideal properties of a dependency function
have been proposed, but without much argumentation. Motivated by this, in this
paper we will discuss and revise the list of desired properties and propose a
new dependency function that meets all these requirements. This general-purpose
dependency function provides data analysts a powerful means to quantify the
level of dependence between variables. To this end, we also provide Python code
to determine the dependency function for use in practice.
Related papers
- Empirical Analysis for Unsupervised Universal Dependency Parse Tree Aggregation [9.075353955444518]
Dependency parsing is an essential task in NLP, and the quality of dependencys is crucial for many downstream tasks.
In various NLP tasks, aggregation methods are used for post-processing aggregation and have been shown to combat the issue of varying quality.
We compare different unsupervised post-processing aggregation methods to identify the most suitable dependency tree structure aggregation method.
arXiv Detail & Related papers (2024-03-28T07:27:10Z) - Fed-CVLC: Compressing Federated Learning Communications with
Variable-Length Codes [54.18186259484828]
In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds.
We show strong evidences that variable-length is beneficial for compression in FL.
We present Fed-CVLC (Federated Learning Compression with Variable-Length Codes), which fine-tunes the code length in response to the dynamics of model updates.
arXiv Detail & Related papers (2024-02-06T07:25:21Z) - Variable Importance in High-Dimensional Settings Requires Grouping [19.095605415846187]
Conditional Permutation Importance (CPI) bypasses PI's limitations in such cases.
Grouping variables statistically via clustering or some prior knowledge gains some power back.
We show that the approach extended with stacking controls the type-I error even with highly-correlated groups.
arXiv Detail & Related papers (2023-12-18T00:21:47Z) - RDGCN: Reinforced Dependency Graph Convolutional Network for
Aspect-based Sentiment Analysis [43.715099882489376]
We propose a new reinforced dependency graph convolutional network (RDGCN) that improves the importance calculation of dependencies in both distance and type views.
Under the criterion, we design a distance-importance function that leverages reinforcement learning for weight distribution search and dissimilarity control.
Comprehensive experiments on three popular datasets demonstrate the effectiveness of the criterion and importance functions.
arXiv Detail & Related papers (2023-11-08T05:37:49Z) - Statistically Valid Variable Importance Assessment through Conditional
Permutations [19.095605415846187]
Conditional Permutation Importance is a new approach to variable importance assessment.
We show that $textitCPI$ overcomes the limitations of standard permutation importance by providing accurate type-I error control.
Our results suggest that $textitCPI$ can be readily used as drop-in replacement for permutation-based methods.
arXiv Detail & Related papers (2023-09-14T10:53:36Z) - Multi-Target XGBoostLSS Regression [91.3755431537592]
We present an extension of XGBoostLSS that models multiple targets and their dependencies in a probabilistic regression setting.
Our approach outperforms existing GBMs with respect to runtime and compares well in terms of accuracy.
arXiv Detail & Related papers (2022-10-13T08:26:14Z) - Bayesian Kernelised Test of (In)dependence with Mixed-type Variables [1.2691047660244332]
A fundamental task in AI is to assess (in)dependence between mixed-type variables (text, image, sound)
We propose a Bayesian kernelised correlation test of (in)dependence using a Dirichlet process model.
We show the properties of the approach, as well as algorithms for fast computation with it.
arXiv Detail & Related papers (2021-05-09T19:21:43Z) - Linguistic dependencies and statistical dependence [76.89273585568084]
We use pretrained language models to estimate probabilities of words in context.
We find that maximum-CPMI trees correspond to linguistic dependencies more often than trees extracted from non-contextual PMI estimate.
arXiv Detail & Related papers (2021-04-18T02:43:37Z) - Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
Efficient [62.24615324523435]
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.
When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient.
arXiv Detail & Related papers (2020-11-08T16:48:02Z) - Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.
We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.