Pseudo-PFLOW: Development of nationwide synthetic open dataset for
people movement based on limited travel survey and open statistical data
- URL: http://arxiv.org/abs/2205.00657v1
- Date: Mon, 2 May 2022 05:13:53 GMT
- Title: Pseudo-PFLOW: Development of nationwide synthetic open dataset for
people movement based on limited travel survey and open statistical data
- Authors: Takehiro Kashiyama, Yanbo Pang, Yoshihide Sekimoto, Takahiro Yabe
- Abstract summary: People flow data are utilized in diverse fields such as urban and commercial planning and disaster management.
This study developed pseudo-people-flow data covering all of Japan by combining public statistical and travel survey data.
- Score: 4.243926243206826
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: People flow data are utilized in diverse fields such as urban and commercial
planning and disaster management. However, people flow data collected from
mobile phones, such as using global positioning system and call detail records
data, are difficult to obtain because of privacy issues. Even if the data were
obtained, they would be difficult to handle. This study developed
pseudo-people-flow data covering all of Japan by combining public statistical
and travel survey data from limited urban areas. This dataset is not a
representation of actual travel movements but of typical weekday movements of
people. Therefore it is expected to be useful for various purposes.
Additionally, the dataset represents the seamless movement of people throughout
Japan, with no restrictions on coverage, unlike the travel surveys. In this
paper, we propose a method for generating pseudo-people-flow and describe the
development of a "Pseudo-PFLOW" dataset covering the entire population of
approximately 130 million people. We then evaluated the accuracy of the dataset
using mobile phone and trip survey data from multiple metropolitan areas. The
results showed that a coefficient of determination of more than 0.5 was
confirmed for comparisons regarding population distribution and trip volume.
Related papers
- A Dataset for the Validation of Truth Inference Algorithms Suitable for Online Deployment [76.04306818209753]
We introduce a substantial crowdsourcing annotation dataset collected from a real-world crowdsourcing platform.
This dataset comprises approximately two thousand workers, one million tasks, and six million annotations.
We evaluate the effectiveness of several representative truth inference algorithms on this dataset.
arXiv Detail & Related papers (2024-03-10T16:00:41Z) - LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning.
However, the promising results achieved on current public datasets may not be applicable to practical scenarios.
We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z) - A deep learning framework to generate realistic population and mobility
data [5.180648702293017]
Census and Household Travel Survey datasets are regularly collected from households and individuals.
These datasets often represent a limited sample of the population due to privacy concerns or are given aggregated.
We propose a framework to generate a synthetic population that includes both socioeconomic features (e.g., age, sex, industry) and trip chains (i.e., activity locations)
arXiv Detail & Related papers (2022-11-14T14:05:09Z) - So2Sat POP -- A Curated Benchmark Data Set for Population Estimation
from Space on a Continental Scale [11.38584315242023]
We provide a comprehensive data set for population estimation in 98 European cities.
The data set comprises a digital elevation model, local climate zone, land use proportions, nighttime lights in combination with multi-spectral Sentinel-2 imagery, and data from the Open Street Map initiative.
arXiv Detail & Related papers (2022-04-07T07:30:43Z) - Biases in human mobility data impact epidemic modeling [0.0]
We identify two types of bias caused by unequal access to, and unequal usage of mobile phones.
We find evidence for data generation bias in all examined datasets in that high-wealth individuals are overrepresented.
To mitigate the skew, we present a framework to debias data and show how simple techniques can be used to increase representativeness.
arXiv Detail & Related papers (2021-12-23T13:20:54Z) - Common Misconceptions about Population Data [5.606904856295946]
This article discusses a diverse range of misconceptions about population data that we believe anybody who works with such data needs to be aware of.
The massive size of such databases is often mistaken as a guarantee for valid inferences on the population of interest.
We conclude with a set of recommendations for inference when using population data.
arXiv Detail & Related papers (2021-12-20T23:54:49Z) - Retiring Adult: New Datasets for Fair Machine Learning [47.27417042497261]
UCI Adult has served as the basis for the development and comparison of many algorithmic fairness interventions.
We reconstruct a superset of the UCI Adult data from available US Census sources and reveal idiosyncrasies of the UCI Adult dataset that limit its external validity.
Our primary contribution is a suite of new datasets that extend the existing data ecosystem for research on fair machine learning.
arXiv Detail & Related papers (2021-08-10T19:19:41Z) - Leveraging Administrative Data for Bias Audits: Assessing Disparate
Coverage with Mobility Data for COVID-19 Policy [61.60099467888073]
We show how linking administrative data can enable auditing mobility data for bias.
We show that older and non-white voters are less likely to be captured by mobility data.
We show that allocating public health resources based on such mobility data could disproportionately harm high-risk elderly and minority groups.
arXiv Detail & Related papers (2020-11-14T02:04:14Z) - Urban Sensing based on Mobile Phone Data: Approaches, Applications and
Challenges [67.71975391801257]
Much concern in mobile data analysis is related to human beings and their behaviours.
This work aims to review the methods and techniques that have been implemented to discover knowledge from mobile phone data.
arXiv Detail & Related papers (2020-08-29T15:14:03Z) - Measuring Social Biases of Crowd Workers using Counterfactual Queries [84.10721065676913]
Social biases based on gender, race, etc. have been shown to pollute machine learning (ML) pipeline predominantly via biased training datasets.
Crowdsourcing, a popular cost-effective measure to gather labeled training datasets, is not immune to the inherent social biases of crowd workers.
We propose a new method based on counterfactual fairness to quantify the degree of inherent social bias in each crowd worker.
arXiv Detail & Related papers (2020-04-04T21:41:55Z) - Mapping Languages and Demographics with Georeferenced Corpora [0.0]
This paper evaluates large georeferenced corpora, taken from both web-crawled and social media sources, against ground-truth population and language-census datasets.
The paper finds that the two datasets represent very different populations.
Twitter data makes better predictions about the inventory of languages used in each country.
arXiv Detail & Related papers (2020-04-02T04:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.