Full-Information Estimation For Hierarchical Data
- URL: http://arxiv.org/abs/2404.13164v1
- Date: Fri, 19 Apr 2024 20:18:16 GMT
- Title: Full-Information Estimation For Hierarchical Data
- Authors: Ryan Cumings-Menon,
- Abstract summary: The U.S. Census Bureau's 2020 Disclosure Avoidance System (DAS) bases its output on noisy measurements.
These noisy measurements are observed in a set of hierarchical geographic units, e.g., the U.S. as a whole, states, counties, census tracts, and census blocks.
This paper describes a method to leverage the hierarchical structure within these noisy measurements to compute confidence intervals for arbitrary tabulations.
- Score: 0.43512163406552007
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The U.S. Census Bureau's 2020 Disclosure Avoidance System (DAS) bases its output on noisy measurements, which are population tabulations added to realizations of mean-zero random variables. These noisy measurements are observed in a set of hierarchical geographic units, e.g., the U.S. as a whole, states, counties, census tracts, and census blocks. The noisy measurements from the 2020 Redistricting Data File and Demographic and Housing Characteristics File statistical data products are now public. The purpose of this paper is to describe a method to leverage the hierarchical structure within these noisy measurements to compute confidence intervals for arbitrary tabulations and in arbitrary geographic entities composed of census blocks. This method is based on computing a weighted least squares estimator (WLS) and its variance matrix. Due to the high dimension of this estimator, this operation is not feasible using the standard approach, since this would require evaluating products with the inverse of a dense matrix with several billion (or even several trillion) rows and columns. In contrast, the approach we describe in this paper computes the required estimate and its variance with a time complexity and memory requirement that scales linearly in the number of census blocks.
Related papers
- Best Linear Unbiased Estimate from Privatized Histograms [6.17477133700348]
In differential privacy (DP) mechanisms, it can be beneficial to release "redundant" outputs.
We show that the minimum variance processing is a linear projection.
We propose the Scalable Algorithm Efficient for Best Linear Unbiased Estimate (SEA BLUE)
arXiv Detail & Related papers (2024-09-06T16:27:34Z) - Noisy Measurements Are Important, the Design of Census Products Is Much More Important [1.52292571922932]
McCartan et al. (2023) call for "making differential privacy work for census data users"
This commentary explains why the 2020 Census Noisy Measurement Files (NMFs) are not the best focus for that plea.
arXiv Detail & Related papers (2023-12-20T15:43:04Z) - Robust Statistical Comparison of Random Variables with Locally Varying
Scale of Measurement [0.562479170374811]
Spaces with locally varying scale of measurement, like multidimensional structures with differently scaled dimensions, are pretty common in statistics and machine learning.
We address this problem by considering an order based on (sets of) expectations of random variables mapping into such non-standard spaces.
This order contains dominance and expectation order as extreme cases when no, or respectively perfect, cardinal structure is given.
arXiv Detail & Related papers (2023-06-22T11:02:18Z) - Concrete Score Matching: Generalized Score Matching for Discrete Data [109.12439278055213]
"Concrete score" is a generalization of the (Stein) score for discrete settings.
"Concrete Score Matching" is a framework to learn such scores from samples.
arXiv Detail & Related papers (2022-11-02T00:41:37Z) - Compact Redistricting Plans Have Many Spanning Trees [39.779544988993294]
In the design and analysis of political redistricting maps, it is often useful to be able to sample from the space of all partitions of the graph of census blocks into connected subgraphs of equal population.
In this paper, we establish an inverse exponential relationship between the total length of the boundaries separating districts and the probability that such a map will be sampled.
arXiv Detail & Related papers (2021-09-27T23:36:01Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Fewer is More: A Deep Graph Metric Learning Perspective Using Fewer
Proxies [65.92826041406802]
We propose a Proxy-based deep Graph Metric Learning approach from the perspective of graph classification.
Multiple global proxies are leveraged to collectively approximate the original data points for each class.
We design a novel reverse label propagation algorithm, by which the neighbor relationships are adjusted according to ground-truth labels.
arXiv Detail & Related papers (2020-10-26T14:52:42Z) - Distribution Matching for Crowd Counting [51.90971145453012]
We show that imposing Gaussians to annotations hurts generalization performance.
We propose to use Distribution Matching for crowd COUNTing (DM-Count)
In terms of Mean Absolute Error, DM-Count outperforms the previous state-of-the-art methods.
arXiv Detail & Related papers (2020-09-28T04:57:23Z) - Magnify Your Population: Statistical Downscaling to Augment the Spatial
Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes.
For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions.
As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z) - NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization [101.13851473792334]
We construct a large-scale congested crowd counting and localization dataset, NWPU-Crowd, consisting of 5,109 images, in a total of 2,133,375 annotated heads with points and boxes.
Compared with other real-world datasets, it contains various illumination scenes and has the largest density range (020,033)
We describe the data characteristics, evaluate the performance of some mainstream state-of-the-art (SOTA) methods, and analyze the new problems that arise on the new data.
arXiv Detail & Related papers (2020-01-10T09:26:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.