Contextuality and dichotomizations of random variables
- URL: http://arxiv.org/abs/2105.03718v3
- Date: Sun, 12 Dec 2021 17:34:51 GMT
- Title: Contextuality and dichotomizations of random variables
- Authors: Janne V. Kujala and Ehtibar N. Dzhafarov
- Abstract summary: The main idea in choosing dichotomizations is that if the set of possible values of a random variable is endowed with a pre-topology (V-space), then the allowable dichotomizations split the space of possible values into two linked subsets.
We focus on two types of random variables most often encountered in practice: categorical and real-valued ones.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Contextuality-by-Default approach to determining and measuring the
(non)contextuality of a system of random variables requires that every random
variable in the system be represented by an equivalent set of dichotomous
random variables. In this paper we present general principles that justify the
use of dichotomizations and determine their choice. The main idea in choosing
dichotomizations is that if the set of possible values of a random variable is
endowed with a pre-topology (V-space), then the allowable dichotomizations
split the space of possible values into two linked subsets ("linkednes" being a
weak form of pre-topological connectedness). We primarily focus on two types of
random variables most often encountered in practice: categorical and
real-valued ones (including continuous random variables, greatly
underrepresented in the contextuality literature). A categorical variable (one
with a finite number of unordered values) is represented by all of its possible
dichotomizations. If the values of a random variable are real numbers, then
they are dichotomized by intervals above and below a variable cut point.
Related papers
- Gower's similarity coefficients with automatic weight selection [0.0]
The most popular dissimilarity for mixed-type variables is derived as the complement to one of the Gower's similarity coefficient.
The discussion on the weighting schemes is sometimes misleading since it often ignores that the unweighted "standard" setting hides an unbalanced contribution of the single variables to the overall dissimilarity.
We address this drawback following the recent idea of introducing a weighting scheme that minimizes the differences in the correlation between each contributing dissimilarity and the resulting weighted Gower's dissimilarity.
arXiv Detail & Related papers (2024-01-30T14:21:56Z) - Non-parametric Conditional Independence Testing for Mixed
Continuous-Categorical Variables: A Novel Method and Numerical Evaluation [14.993705256147189]
Conditional independence testing (CIT) is a common task in machine learning.
Many real-world applications involve mixed-type datasets that include numerical and categorical variables.
We propose a variation of the former approach that does not treat categorical variables as numeric.
arXiv Detail & Related papers (2023-10-17T10:29:23Z) - Predicting Out-of-Domain Generalization with Neighborhood Invariance [59.05399533508682]
We propose a measure of a classifier's output invariance in a local transformation neighborhood.
Our measure is simple to calculate, does not depend on the test point's true label, and can be applied even in out-of-domain (OOD) settings.
In experiments on benchmarks in image classification, sentiment analysis, and natural language inference, we demonstrate a strong and robust correlation between our measure and actual OOD generalization.
arXiv Detail & Related papers (2022-07-05T14:55:16Z) - Machine Learning for Multi-Output Regression: When should a holistic
multivariate approach be preferred over separate univariate ones? [62.997667081978825]
Tree-based ensembles such as the Random Forest are modern classics among statistical learning methods.
We compare these methods in extensive simulations to help in answering the primary question when to use multivariate ensemble techniques.
arXiv Detail & Related papers (2022-01-14T08:44:25Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - Safe Tests and Always-Valid Confidence Intervals for contingency tables
and beyond [69.25055322530058]
We develop E variables for testing whether two data streams come from the same source or not.
These E variables lead to tests that remain safe, under flexible sampling scenarios such as optional stopping and continuation.
arXiv Detail & Related papers (2021-06-04T20:12:13Z) - Contextuality and Random Variables [0.0]
The identity of a random variable in a system is determined by its joint distribution with all other random variables in the same context.
When context changes, a variable measuring some property is instantly replaced by another random variable measuring the same property, or disappears if this property is not measured in the new context.
arXiv Detail & Related papers (2021-04-26T11:47:54Z) - Contents, Contexts, and Basics of Contextuality [0.0]
This is a non-technical introduction into theory of contextuality.
It presents the basics of a theory of contextuality called Contextuality-by-Default (CbD)
arXiv Detail & Related papers (2021-03-14T15:35:54Z) - An Embedded Model Estimator for Non-Stationary Random Functions using
Multiple Secondary Variables [0.0]
This paper introduces the method and shows that it has consistency results that are similar in nature to those applying to geostatistical modelling and to Quantile Random Forests.
The algorithm works by estimating a conditional distribution for the target variable at each target location.
arXiv Detail & Related papers (2020-11-09T00:14:24Z) - Tractable Inference in Credal Sentential Decision Diagrams [116.6516175350871]
Probabilistic sentential decision diagrams are logic circuits where the inputs of disjunctive gates are annotated by probability values.
We develop the credal sentential decision diagrams, a generalisation of their probabilistic counterpart that allows for replacing the local probabilities with credal sets of mass functions.
For a first empirical validation, we consider a simple application based on noisy seven-segment display images.
arXiv Detail & Related papers (2020-08-19T16:04:34Z) - Contextuality scenarios arising from networks of stochastic processes [68.8204255655161]
An empirical model is said contextual if its distributions cannot be obtained marginalizing a joint distribution over X.
We present a different and classical source of contextual empirical models: the interaction among many processes.
The statistical behavior of the network in the long run makes the empirical model generically contextual and even strongly contextual.
arXiv Detail & Related papers (2020-06-22T16:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.