Controlling for multiple covariates
- URL: http://arxiv.org/abs/2112.00672v1
- Date: Wed, 1 Dec 2021 17:37:36 GMT
- Title: Controlling for multiple covariates
- Authors: Mark Tygert
- Abstract summary: A fundamental problem in statistics is to compare the outcomes attained by members of subpopulations.
Comparison makes the most sense when performed separately for individuals who are similar according to certain characteristics.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A fundamental problem in statistics is to compare the outcomes attained by
members of subpopulations. This problem arises in the analysis of randomized
controlled trials, in the analysis of A/B tests, and in the assessment of
fairness and bias in the treatment of sensitive subpopulations, especially when
measuring the effects of algorithms and machine learning. Often the comparison
makes the most sense when performed separately for individuals who are similar
according to certain characteristics given by the values of covariates of
interest; the separate comparisons can also be aggregated in various ways to
compare across all values of the covariates. Separating, segmenting, or
stratifying into those with similar values of the covariates is also known as
"conditioning on" or "controlling for" those covariates; controlling for age or
annual income is common.
Two standard methods of controlling for covariates are (1) binning and (2)
regression modeling. Binning requires making fairly arbitrary, yet frequently
highly influential choices, and is unsatisfactorily temperamental in multiple
dimensions, with multiple covariates. Regression analysis works wonderfully
when there is good reason to believe in a particular parameterized regression
model or classifier (such as logistic regression). Thus, there appears to be no
extant canonical fully non-parametric regression for the comparison of
subpopulations, not while conditioning on multiple specified covariates.
Existing methods rely on analysts to make choices, and those choices can be
debatable; analysts can deceive others or even themselves. The present paper
aims to fill the gap, combining two ingredients: (1) recently developed
methodologies for such comparisons that already exist when conditioning on a
single scalar covariate and (2) the Hilbert space-filling curve that maps
continuously from one dimension to multiple dimensions.
Related papers
- Gower's similarity coefficients with automatic weight selection [0.0]
The most popular dissimilarity for mixed-type variables is derived as the complement to one of the Gower's similarity coefficient.
The discussion on the weighting schemes is sometimes misleading since it often ignores that the unweighted "standard" setting hides an unbalanced contribution of the single variables to the overall dissimilarity.
We address this drawback following the recent idea of introducing a weighting scheme that minimizes the differences in the correlation between each contributing dissimilarity and the resulting weighted Gower's dissimilarity.
arXiv Detail & Related papers (2024-01-30T14:21:56Z) - TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression [109.69084997173196]
Deepscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood.
Recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation.
We study two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean?
Our results show that not only does TIC accurately learn the covariance, it additionally facilitates an improved convergence of the negative log-likelihood.
arXiv Detail & Related papers (2023-10-29T09:54:03Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Logistic Regression Equivalence: A Framework for Comparing Logistic
Regression Models Across Populations [4.518012967046983]
We argue that equivalence testing for a prespecified tolerance level on population differences incentivizes accuracy in the inference.
For diagnosis data, we show examples for equivalent and non-equivalent models.
arXiv Detail & Related papers (2023-03-23T15:12:52Z) - Robust and Agnostic Learning of Conditional Distributional Treatment
Effects [62.44901952244514]
The conditional average treatment effect (CATE) is the best point prediction of individual causal effects.
In aggregate analyses, this is usually addressed by measuring distributional treatment effect (DTE)
We provide a new robust and model-agnostic methodology for learning the conditional DTE (CDTE) for a wide class of problems.
arXiv Detail & Related papers (2022-05-23T17:40:31Z) - CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator [60.799183326613395]
We propose an unbiased estimator for categorical random variables based on multiple mutually negatively correlated (jointly antithetic) samples.
CARMS combines REINFORCE with copula based sampling to avoid duplicate samples and reduce its variance, while keeping the estimator unbiased using importance sampling.
We evaluate CARMS on several benchmark datasets on a generative modeling task, as well as a structured output prediction task, and find it to outperform competing methods including a strong self-control baseline.
arXiv Detail & Related papers (2021-10-26T20:14:30Z) - A graphical method of cumulative differences between two subpopulations [0.0]
This paper develops methods for the common case in which no score of any member of the subpopulations being compared is exactly equal to the score of any other member of either subpopulation.
arXiv Detail & Related papers (2021-08-05T14:59:56Z) - For high-dimensional hierarchical models, consider exchangeability of
effects across covariates instead of across datasets [18.74167116981788]
We show that standard practice exhibits poor statistical performance when the number of covariates exceeds the number of datasets.
In statistical genetics, we might regress dozens of traits (defining datasets) for thousands of individuals (responses) on up to millions of genetic variants.
We propose a hierarchical model expressing our alternative perspective.
arXiv Detail & Related papers (2021-07-13T23:23:06Z) - Flexible Model Aggregation for Quantile Regression [92.63075261170302]
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions.
We investigate methods for aggregating any number of conditional quantile models.
All of the models we consider in this paper can be fit using modern deep learning toolkits.
arXiv Detail & Related papers (2021-02-26T23:21:16Z) - Conditional canonical correlation estimation based on covariates with
random forests [0.0]
We propose a new method called Random Forest with Canonical Correlation Analysis (RFCCA) to estimate the conditional canonical correlations between two sets of variables.
The proposed method and the global significance test is evaluated through simulation studies that show it provides accurate canonical correlation estimations and well-controlled Type-1 error.
arXiv Detail & Related papers (2020-11-23T17:09:46Z) - Learning from Aggregate Observations [82.44304647051243]
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances.
We present a general probabilistic framework that accommodates a variety of aggregate observations.
Simple maximum likelihood solutions can be applied to various differentiable models.
arXiv Detail & Related papers (2020-04-14T06:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.