States of Disarray: Cleaning Data for Gerrymandering Analysis
- URL: http://arxiv.org/abs/2503.13521v1
- Date: Fri, 14 Mar 2025 19:33:00 GMT
- Title: States of Disarray: Cleaning Data for Gerrymandering Analysis
- Authors: Ananya Agarwal, Fnu Alusi, Arbie Hsu, Arif Syraj, Ellen Veomett,
- Abstract summary: We have made data for 22 states available for researchers, students, and the general public to easily access and analyze.<n>At the time of submission, we have data for 22 states available for researchers, students, and the general public to easily access and analyze.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The mathematics of redistricting is an area of study that has exploded in recent years. In particular, many different research groups and expert witnesses in court cases have used outlier analysis to argue that a proposed map is a gerrymander. This outlier analysis relies on having an ensemble of potential redistricting maps against which the proposed map is compared. Arguably the most widely-accepted method of creating such an ensemble is to use a Markov Chain Monte Carlo (MCMC) process. This process requires that various pieces of data be gathered, cleaned, and coalesced into a single file that can be used as the seed of the MCMC process. In this article, we describe how we have begun this cleaning process for each state, and made the resulting data available for the public at https://github.com/eveomett-states . At the time of submission, we have data for 22 states available for researchers, students, and the general public to easily access and analyze. We will continue the data cleaning process for each state, and we hope that the availability of these datasets will both further research in this area, and increase the public's interest in and understanding of modern techniques to detect gerrymandering.
Related papers
- Contrast Pattern Mining: A Survey [54.06874773607785]
It is difficult for new researchers in the field to understand the general situation of the field in a short period of time.
First, we present an in-depth understanding of CPM, including basic concepts, types, mining strategies, and metrics for assessing discriminative ability.
We classify CPM methods according to their characteristics into boundary-based algorithms, tree-based algorithms, evolutionary fuzzy system-based algorithms, decision tree-based algorithms, and other algorithms.
arXiv Detail & Related papers (2022-09-27T17:11:12Z) - Robust Self-Tuning Data Association for Geo-Referencing Using Lane Markings [44.4879068879732]
This paper presents a complete pipeline for resolving ambiguities during the data association.
Its core is a robust self-tuning data association that adapts the search area depending on the entropy of the measurements.
We evaluate our method on real data from urban and rural scenarios around the city of Karlsruhe in Germany.
arXiv Detail & Related papers (2022-07-28T12:29:39Z) - Mathematically Quantifying Non-responsiveness of the 2021 Georgia
Congressional Districting Plan [3.097163558730473]
We use a Metropolized-sampling technique through a parallel tempering method combined with ReCom.
We develop these improvements through the first case study of district plans in Georgia.
Our analysis projects that any election in Georgia will reliably elect 9 Republicans and 5 Democrats under the enacted plan.
arXiv Detail & Related papers (2022-03-13T02:58:32Z) - Implications of Distance over Redistricting Maps: Central and Outlier Maps [10.318010762465939]
In representative democracy, a redistricting map is chosen to partition an electorate into districts which each elects a representative.<n>A valid redistricting map must satisfy a collection of constraints such as being compact, contiguous, and of almost-equal population.<n>This enables a partisan legislature to gerrymander by choosing a map which unfairly favors it.
arXiv Detail & Related papers (2022-03-02T04:59:30Z) - Measuring Geometric Similarity Across Possible Plans for Automated
Redistricting [0.0]
This paper briefly introduces an interpretive measure of similarity, and a corresponding assignment matrix, that corresponds to the percentage of a state's area or population that stays in the same congressional district between two plans.
We then show how to calculate this measure in an intuitive time and briefly demonstrate some potential use-cases.
arXiv Detail & Related papers (2021-11-17T03:37:25Z) - Compact Redistricting Plans Have Many Spanning Trees [39.779544988993294]
In the design and analysis of political redistricting maps, it is often useful to be able to sample from the space of all partitions of the graph of census blocks into connected subgraphs of equal population.
In this paper, we establish an inverse exponential relationship between the total length of the boundaries separating districts and the probability that such a map will be sampled.
arXiv Detail & Related papers (2021-09-27T23:36:01Z) - Noise-Resistant Deep Metric Learning with Probabilistic Instance
Filtering [59.286567680389766]
Noisy labels are commonly found in real-world data, which cause performance degradation of deep neural networks.
We propose Probabilistic Ranking-based Instance Selection with Memory (PRISM) approach for DML.
PRISM calculates the probability of a label being clean, and filters out potentially noisy samples.
arXiv Detail & Related papers (2021-08-03T12:15:25Z) - Sequential Monte Carlo for Sampling Balanced and Compact Redistricting
Plans [0.0]
We present a new Sequential Monte Carlo (SMC) algorithm that generates a sample of redistricting plans converging to a realistic target distribution.
We validate the accuracy of the proposed algorithm by using a small map where all redistricting plans can be enumerated.
We then apply the SMC algorithm to evaluate the partisan implications of several maps submitted by relevant parties in a recent high-profile redistricting case in the state of Pennsylvania.
arXiv Detail & Related papers (2020-08-13T23:26:34Z) - Learning to Summarize Passages: Mining Passage-Summary Pairs from
Wikipedia Revision Histories [110.54963847339775]
We propose a method for automatically constructing a passage-to-summary dataset by mining the Wikipedia page revision histories.
In particular, the method mines the main body passages and the introduction sentences which are added to the pages simultaneously.
The constructed dataset contains more than one hundred thousand passage-summary pairs.
arXiv Detail & Related papers (2020-04-06T12:11:50Z) - CNN-based Density Estimation and Crowd Counting: A Survey [65.06491415951193]
This paper comprehensively studies the crowd counting models, mainly CNN-based density map estimation methods.
According to the evaluation metrics, we select the top three performers on their crowd counting datasets.
We expect to make reasonable inference and prediction for the future development of crowd counting.
arXiv Detail & Related papers (2020-03-28T13:17:30Z) - NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization [101.13851473792334]
We construct a large-scale congested crowd counting and localization dataset, NWPU-Crowd, consisting of 5,109 images, in a total of 2,133,375 annotated heads with points and boxes.
Compared with other real-world datasets, it contains various illumination scenes and has the largest density range (020,033)
We describe the data characteristics, evaluate the performance of some mainstream state-of-the-art (SOTA) methods, and analyze the new problems that arise on the new data.
arXiv Detail & Related papers (2020-01-10T09:26:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.