Cumulative deviation of a subpopulation from the full population
- URL: http://arxiv.org/abs/2008.01779v5
- Date: Wed, 7 Jul 2021 16:40:53 GMT
- Title: Cumulative deviation of a subpopulation from the full population
- Authors: Mark Tygert
- Abstract summary: Assessing equity in treatment of a subpopulation often involves assigning numerical "scores" to all individuals in the full population.
Given such scores, individuals with similar scores may or may not attain similar outcomes independent of the individuals' memberships in the subpopulation.
The cumulative plots encode subpopulation deviation directly as the slopes of secant lines for the graphs.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Assessing equity in treatment of a subpopulation often involves assigning
numerical "scores" to all individuals in the full population such that similar
individuals get similar scores; matching via propensity scores or appropriate
covariates is common, for example. Given such scores, individuals with similar
scores may or may not attain similar outcomes independent of the individuals'
memberships in the subpopulation. The traditional graphical methods for
visualizing inequities are known as "reliability diagrams" or "calibrations
plots," which bin the scores into a partition of all possible values, and for
each bin plot both the average outcomes for only individuals in the
subpopulation as well as the average outcomes for all individuals; comparing
the graph for the subpopulation with that for the full population gives some
sense of how the averages for the subpopulation deviate from the averages for
the full population. Unfortunately, real data sets contain only finitely many
observations, limiting the usable resolution of the bins, and so the
conventional methods can obscure important variations due to the binning.
Fortunately, plotting cumulative deviation of the subpopulation from the full
population as proposed in this paper sidesteps the problematic coarse binning.
The cumulative plots encode subpopulation deviation directly as the slopes of
secant lines for the graphs. Slope is easy to perceive even when the constant
offsets of the secant lines are irrelevant. The cumulative approach avoids
binning that smooths over deviations of the subpopulation from the full
population. Such cumulative aggregation furnishes both high-resolution
graphical methods and simple scalar summary statistics (analogous to those of
Kuiper and of Kolmogorov and Smirnov used in statistical significance testing
for comparing probability distributions).
Related papers
- Learning With Multi-Group Guarantees For Clusterable Subpopulations [14.042643978487453]
A canonical desideratum for prediction problems is that performance guarantees should hold on average over the population.
But what constitutes a meaningful subpopulation?
We take the perspective that relevant subpopulations should be defined with respect to the clusters that naturally emerge from the distribution of individuals.
arXiv Detail & Related papers (2024-10-18T16:38:55Z) - Modeling and Forecasting COVID-19 Cases using Latent Subpopulations [8.69240208462227]
We propose two new methods to model the number of people infected with COVID-19 over time.
Method #1 is a dictionary-based approach, which begins with a large number of pre-defined sub-population models.
Method #2 is a mixture-of-$M$ fittable curves, where $M$, the number of sub-populations to use, is given by the user.
arXiv Detail & Related papers (2023-02-09T18:33:41Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - Sequential Community Mode Estimation [7.693649834261879]
We study the problem of identifying the largest community within the population via sequential, random sampling of individuals.
We propose and analyse novel algorithms for this problem, and also establish information theoretic lower bounds on the probability of error under any algorithm.
arXiv Detail & Related papers (2021-11-16T15:05:40Z) - A graphical method of cumulative differences between two subpopulations [0.0]
This paper develops methods for the common case in which no score of any member of the subpopulations being compared is exactly equal to the score of any other member of either subpopulation.
arXiv Detail & Related papers (2021-08-05T14:59:56Z) - Selective Classification Can Magnify Disparities Across Groups [89.14499988774985]
We find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities.
Increasing abstentions can even decrease accuracies on some groups.
We train distributionally-robust models that achieve similar full-coverage accuracies across groups and show that selective classification uniformly improves each group.
arXiv Detail & Related papers (2020-10-27T08:51:30Z) - Distribution Matching for Crowd Counting [51.90971145453012]
We show that imposing Gaussians to annotations hurts generalization performance.
We propose to use Distribution Matching for crowd COUNTing (DM-Count)
In terms of Mean Absolute Error, DM-Count outperforms the previous state-of-the-art methods.
arXiv Detail & Related papers (2020-09-28T04:57:23Z) - Distributionally Robust Losses for Latent Covariate Mixtures [28.407773942857148]
We propose a convex procedure that controls the worst-case performance over all subpopulations of a given size.
We observe on lexical similarity, wine quality, and recidivism prediction tasks that our worst-case procedure learns models that do well against unseen subpopulations.
arXiv Detail & Related papers (2020-07-28T04:16:27Z) - Magnify Your Population: Statistical Downscaling to Augment the Spatial
Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes.
For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions.
As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z) - Survival Cluster Analysis [93.50540270973927]
There is an unmet need in survival analysis for identifying subpopulations with distinct risk profiles.
An approach that addresses this need is likely to improve characterization of individual outcomes.
arXiv Detail & Related papers (2020-02-29T22:41:21Z) - A General Method for Robust Learning from Batches [56.59844655107251]
We consider a general framework of robust learning from batches, and determine the limits of both classification and distribution estimation over arbitrary, including continuous, domains.
We derive the first robust computationally-efficient learning algorithms for piecewise-interval classification, and for piecewise-polynomial, monotone, log-concave, and gaussian-mixture distribution estimation.
arXiv Detail & Related papers (2020-02-25T18:53:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.