The NetMob2024 Dataset: Population Density and OD Matrices from Four LMIC Countries
- URL: http://arxiv.org/abs/2410.00453v2
- Date: Wed, 2 Oct 2024 18:18:36 GMT
- Title: The NetMob2024 Dataset: Population Density and OD Matrices from Four LMIC Countries
- Authors: Wenlan Zhang, Miguel Nunez del Prado, Vincent Gauthier, Sveta Milusheva,
- Abstract summary: The NetMob24 dataset offers a unique opportunity for researchers from a range of academic fields to access comprehensive data sets spanning four countries over the course of two years ( 2019 and 2020)
This dataset comprises privacy-preserving data sets from mobile application (app) data collected from users who have voluntarily consented to anonymous data collection for research purposes.
It is our hope that this reference dataset will foster the production of new research methods and the aggregated of research outcomes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The NetMob24 dataset offers a unique opportunity for researchers from a range of academic fields to access comprehensive spatiotemporal data sets spanning four countries (India, Mexico, Indonesia, and Colombia) over the course of two years (2019 and 2020). This dataset, developed in collaboration with Cuebiq (Also referred to as Spectus), comprises privacy-preserving aggregated data sets derived from mobile application (app) data collected from users who have voluntarily consented to anonymous data collection for research purposes. It is our hope that this reference dataset will foster the production of new research methods and the reproducibility of research outcomes.
Related papers
- Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - The Full-scale Assembly Simulation Testbed (FAST) Dataset [3.483595743063401]
We present a new open dataset captured with our VR-based Full-scale Assembly Simulation Testbed (FAST)
This dataset consists of data collected from 108 participants learning how to assemble two distinct full-scale structures in VR.
arXiv Detail & Related papers (2024-03-13T21:30:01Z) - Datasets for Large Language Models: A Comprehensive Survey [37.153302283062004]
The survey consolidates and categorizes the fundamental aspects of LLM datasets from five perspectives.
The survey sheds light on the prevailing challenges and points out potential avenues for future investigation.
The total data size surveyed surpasses 774.5 TB for pre-training corpora and 700M instances for other datasets.
arXiv Detail & Related papers (2024-02-28T04:35:51Z) - DataFinder: Scientific Dataset Recommendation from Natural Language
Descriptions [100.52917027038369]
We operationalize the task of recommending datasets given a short natural language description.
To facilitate this task, we build the DataFinder dataset which consists of a larger automatically-constructed training set and a smaller expert-annotated evaluation set.
This system, trained on the DataFinder dataset, finds more relevant search results than existing third-party dataset search engines.
arXiv Detail & Related papers (2023-05-26T05:22:36Z) - DeepShovel: An Online Collaborative Platform for Data Extraction in
Geoscience Literature with AI Assistance [48.55345030503826]
Geoscientists need to read a huge amount of literature to locate, extract, and aggregate relevant results and data.
DeepShovel is a publicly-available AI-assisted data extraction system to support their needs.
A follow-up user evaluation with 14 researchers suggested DeepShovel improved users' efficiency of data extraction for building scientific databases.
arXiv Detail & Related papers (2022-02-21T12:18:08Z) - A Summary of COVID-19 Datasets [1.3490988186255934]
This research presents a review of main datasets that are developed for COVID-19 research.
We hope this collection will continue to bring together members of the computing community, biomedical experts, and policymakers.
arXiv Detail & Related papers (2022-02-06T17:34:26Z) - Datasets: A Community Library for Natural Language Processing [55.48866401721244]
datasets is a community library for contemporary NLP.
The library includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects.
arXiv Detail & Related papers (2021-09-07T03:59:22Z) - Retiring Adult: New Datasets for Fair Machine Learning [47.27417042497261]
UCI Adult has served as the basis for the development and comparison of many algorithmic fairness interventions.
We reconstruct a superset of the UCI Adult data from available US Census sources and reveal idiosyncrasies of the UCI Adult dataset that limit its external validity.
Our primary contribution is a suite of new datasets that extend the existing data ecosystem for research on fair machine learning.
arXiv Detail & Related papers (2021-08-10T19:19:41Z) - CO-Search: COVID-19 Information Retrieval with Semantic Search, Question
Answering, and Abstractive Summarization [53.67205506042232]
CO-Search is a retriever-ranker semantic search engine designed to handle complex queries over the COVID-19 literature.
To account for the domain-specific and relatively limited dataset, we generate a bipartite graph of document paragraphs and citations.
We evaluate our system on the data of the TREC-COVID information retrieval challenge.
arXiv Detail & Related papers (2020-06-17T01:32:48Z) - CovidNet: To Bring Data Transparency in the Era of COVID-19 [9.808021836153712]
This paper presents CovidNet, a COVID-19 tracking project associated with a large scale epidemic dataset.
CovidNet is the only platform providing real-time global case information of more than 4,124 sub-divisions from over 27 countries worldwide.
The accuracy and freshness of the dataset is a result of the painstaking efforts from our voluntary teamwork, crowd-sourcing channels, and automated data pipelines.
arXiv Detail & Related papers (2020-05-22T00:05:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.