A graphical method of cumulative differences between two subpopulations
- URL: http://arxiv.org/abs/2108.02666v3
- Date: Sun, 24 Oct 2021 17:14:00 GMT
- Title: A graphical method of cumulative differences between two subpopulations
- Authors: Mark Tygert
- Abstract summary: This paper develops methods for the common case in which no score of any member of the subpopulations being compared is exactly equal to the score of any other member of either subpopulation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Comparing the differences in outcomes (that is, in "dependent variables")
between two subpopulations is often most informative when comparing outcomes
only for individuals from the subpopulations who are similar according to
"independent variables." The independent variables are generally known as
"scores," as in propensity scores for matching or as in the probabilities
predicted by statistical or machine-learned models, for example. If the
outcomes are discrete, then some averaging is necessary to reduce the noise
arising from the outcomes varying randomly over those discrete values in the
observed data. The traditional method of averaging is to bin the data according
to the scores and plot the average outcome in each bin against the average
score in the bin. However, such binning can be rather arbitrary and yet greatly
impacts the interpretation of displayed deviation between the subpopulations
and assessment of its statistical significance. Fortunately, such binning is
entirely unnecessary in plots of cumulative differences and in the associated
scalar summary metrics that are analogous to the workhorse statistics of
comparing probability distributions -- those due to Kolmogorov and Smirnov and
their refinements due to Kuiper. The present paper develops such cumulative
methods for the common case in which no score of any member of the
subpopulations being compared is exactly equal to the score of any other member
of either subpopulation.
Related papers
- Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Gower's similarity coefficients with automatic weight selection [0.0]
The most popular dissimilarity for mixed-type variables is derived as the complement to one of the Gower's similarity coefficient.
The discussion on the weighting schemes is sometimes misleading since it often ignores that the unweighted "standard" setting hides an unbalanced contribution of the single variables to the overall dissimilarity.
We address this drawback following the recent idea of introducing a weighting scheme that minimizes the differences in the correlation between each contributing dissimilarity and the resulting weighted Gower's dissimilarity.
arXiv Detail & Related papers (2024-01-30T14:21:56Z) - Mean Estimation with User-level Privacy under Data Heterogeneity [54.07947274508013]
Different users may possess vastly different numbers of data points.
It cannot be assumed that all users sample from the same underlying distribution.
We propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data.
arXiv Detail & Related papers (2023-07-28T23:02:39Z) - Cumulative differences between paired samples [0.7237216416830515]
Simple, most common paired samples consist of observations from two populations.
The pair of observed responses (one from each population) at the same value of the covariate is known as a "matched pair"
The cumulative approach is fully nonparametric and uniquely defined.
arXiv Detail & Related papers (2023-05-18T22:11:54Z) - Counting Like Human: Anthropoid Crowd Counting on Modeling the
Similarity of Objects [92.80955339180119]
mainstream crowd counting methods regress density map and integrate it to obtain counting results.
Inspired by this, we propose a rational and anthropoid crowd counting framework.
arXiv Detail & Related papers (2022-12-02T07:00:53Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - The role of the geometric mean in case-control studies [4.38301148531795]
We describe how to partially identify, estimate, and do inference on the geometric odds ratio under outcome-dependent sampling.
Our proposed estimator is based on the efficient influence function and therefore has doubly robust-style properties.
arXiv Detail & Related papers (2022-07-19T01:42:52Z) - Controlling for multiple covariates [0.0]
A fundamental problem in statistics is to compare the outcomes attained by members of subpopulations.
Comparison makes the most sense when performed separately for individuals who are similar according to certain characteristics.
arXiv Detail & Related papers (2021-12-01T17:37:36Z) - CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator [60.799183326613395]
We propose an unbiased estimator for categorical random variables based on multiple mutually negatively correlated (jointly antithetic) samples.
CARMS combines REINFORCE with copula based sampling to avoid duplicate samples and reduce its variance, while keeping the estimator unbiased using importance sampling.
We evaluate CARMS on several benchmark datasets on a generative modeling task, as well as a structured output prediction task, and find it to outperform competing methods including a strong self-control baseline.
arXiv Detail & Related papers (2021-10-26T20:14:30Z) - Cumulative deviation of a subpopulation from the full population [0.0]
Assessing equity in treatment of a subpopulation often involves assigning numerical "scores" to all individuals in the full population.
Given such scores, individuals with similar scores may or may not attain similar outcomes independent of the individuals' memberships in the subpopulation.
The cumulative plots encode subpopulation deviation directly as the slopes of secant lines for the graphs.
arXiv Detail & Related papers (2020-08-04T19:30:02Z) - Learning from Aggregate Observations [82.44304647051243]
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances.
We present a general probabilistic framework that accommodates a variety of aggregate observations.
Simple maximum likelihood solutions can be applied to various differentiable models.
arXiv Detail & Related papers (2020-04-14T06:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.