Related papers: CovidNet: To Bring Data Transparency in the Era of COVID-19

CovidNet: To Bring Data Transparency in the Era of COVID-19

URL: http://arxiv.org/abs/2005.10948v3
Date: Mon, 20 Jul 2020 21:32:24 GMT
Title: CovidNet: To Bring Data Transparency in the Era of COVID-19
Authors: Tong Yang, Kai Shen, Sixuan He, Enyu Li, Peter Sun, Pingying Chen, Lin Zuo, Jiayue Hu, Yiwen Mo, Weiwei Zhang, Haonan Zhang, Jingxue Chen, Yu Guo
Abstract summary: This paper presents CovidNet, a COVID-19 tracking project associated with a large scale epidemic dataset. CovidNet is the only platform providing real-time global case information of more than 4,124 sub-divisions from over 27 countries worldwide. The accuracy and freshness of the dataset is a result of the painstaking efforts from our voluntary teamwork, crowd-sourcing channels, and automated data pipelines.
Score: 9.808021836153712
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Timely, creditable, and fine-granular case information is vital for local communities and individual citizens to make rational and data-driven responses to the COVID-19 pandemic. This paper presents CovidNet, a COVID-19 tracking project associated with a large scale epidemic dataset, which was initiated by 1Point3Acres. To the best of our knowledge, the project is the only platform providing real-time global case information of more than 4,124 sub-divisions from over 27 countries worldwide with multi-language supports. The platform also offers interactive visualization tools to analyze the full historical case curves in each region. Initially launched as a voluntary project to bridge the data transparency gap in North America in January 2020, this project by far has become one of the major independent sources worldwide and has been consumed by many other tracking platforms. The accuracy and freshness of the dataset is a result of the painstaking efforts from our voluntary teamwork, crowd-sourcing channels, and automated data pipelines. As of May 18, 2020, the project website has been visited more than 200 million times and the CovidNet dataset has empowered over 522 institutions and organizations worldwide in policy-making and academic researches. All datasets are openly accessible for non-commercial purposes at https://coronavirus.1point3acres.com via a formal request through our APIs.

Related papers

The NetMob25 Dataset: A High-resolution Multi-layered View of Individual Mobility in Greater Paris Region [64.30214722988666]
This paper describes the survey design, collection protocol, processing methodology, and characteristics of the released dataset.<n>The dataset includes three components: (i) an Individuals database describing demographic, socioeconomic, and household characteristics; (ii) a Trips database with over 80,000 annotated displacements including timestamps, transport modes, and trip purposes; and (iii) a Raw GPS Traces database comprising about 500 million high-frequency points.
arXiv Detail & Related papers (2025-06-06T09:22:21Z)
Multi-Platform Aggregated Dataset of Online Communities (MADOC) [64.45797970830233]
MADOC aggregates and standardizes data from Bluesky, Koo, Reddit, and Voat (2012-2024), containing 18.9 million posts, 236 million comments, and 23.1 million unique users. The dataset enables comparative studies of toxic behavior evolution across platforms through standardized interaction records and sentiment analysis.
arXiv Detail & Related papers (2025-01-22T14:02:11Z)
Linked Data on Geo-annotated Events and Use Cases for the Resilience of Ukraine [4.3944133124205]
We focus on datasets about damaging events in Ukraine due to Russia's invasion between February 2022 and the end of April 2023. We convert two selected datasets to Linked Data and enrich them with additional geospatial information. We present an algorithm for the detection of identical events from different datasets.
arXiv Detail & Related papers (2024-12-24T10:59:38Z)
Bridging the Data Provenance Gap Across Text, Speech and Video [67.72097952282262]
We conduct the largest and first-of-its-kind longitudinal audit across modalities of popular text, speech, and video datasets. Our manual analysis covers nearly 4000 public datasets between 1990-2024, spanning 608 languages, 798 sources, 659 organizations, and 67 countries. We find that multimodal machine learning applications have overwhelmingly turned to web-crawled, synthetic, and social media platforms, such as YouTube, for their training sets.
arXiv Detail & Related papers (2024-12-19T01:30:19Z)
Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election [49.35115948941981]
We present Uchaguzi-2022, a dataset of 14k categorized and geotagged citizen reports related to the 2022 Kenyan General Election. We use this dataset to investigate whether language models can assist in scalably categorizing and geotagging reports, thus highlighting its potential application in the AI for Social Good space.
arXiv Detail & Related papers (2024-12-17T17:08:35Z)
Labeled Datasets for Research on Information Operations [71.34999856621306]
We present new labeled datasets about 26 campaigns, which contain both IO posts verified by a social media platform and over 13M posts by 303k accounts that discussed similar topics in the same time frames (control data) The datasets will facilitate the study of narratives, network interactions, and engagement strategies employed by coordinated accounts across various campaigns and countries.
arXiv Detail & Related papers (2024-11-15T22:15:01Z)
The NetMob2024 Dataset: Population Density and OD Matrices from Four LMIC Countries [0.0]
The NetMob24 dataset offers a unique opportunity for researchers from a range of academic fields to access comprehensive data sets spanning four countries over the course of two years ( 2019 and 2020) This dataset comprises privacy-preserving data sets from mobile application (app) data collected from users who have voluntarily consented to anonymous data collection for research purposes. It is our hope that this reference dataset will foster the production of new research methods and the aggregated of research outcomes.
arXiv Detail & Related papers (2024-10-01T07:17:19Z)
Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs) We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z)
Analyzing the Impact of Fake News on the Anticipated Outcome of the 2024 Election Ahead of Time [7.1970442944315245]
Despite increasing awareness and research around fake news, there is still a significant need for datasets that specifically target racial slurs and biases within North American political speeches. This study introduces a comprehensive dataset that illuminates these critical aspects of misinformation.
arXiv Detail & Related papers (2023-12-01T20:14:16Z)
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset [75.9621305227523]
We introduce LMSYS-Chat-1M, a large-scale dataset containing one million real-world conversations with 25 state-of-the-art large language models (LLMs) This dataset is collected from 210K IP addresses in the wild on our Vicuna demo and Arena website. We demonstrate its versatility through four use cases: developing content moderation models that perform similarly to GPT-4, building a safety benchmark, training instruction-following models that perform similarly to Vicuna, and creating challenging benchmark questions.
arXiv Detail & Related papers (2023-09-21T12:13:55Z)
COVID-19: An exploration of consecutive systemic barriers to pathogen-related data sharing during a pandemic [3.192308005611312]
In 2020, the COVID-19 pandemic resulted in a rapid response from governments and researchers worldwide. As of late 2023, over millions have died as a result of COVID-19. Data professionals working with pandemic-relevant data often face significant systemic barriers to accessing, sharing or re-using this data.
arXiv Detail & Related papers (2022-05-24T14:25:09Z)
A Summary of COVID-19 Datasets [1.3490988186255934]
This research presents a review of main datasets that are developed for COVID-19 research. We hope this collection will continue to bring together members of the computing community, biomedical experts, and policymakers.
arXiv Detail & Related papers (2022-02-06T17:34:26Z)
Global Tweet Mentions of COVID-19 [3.3043776328952226]
We present an open-source dataset of 1.92 million keyword-selected Twitter posts, updated weekly from January 2020 to present. The dashboard presents 100% of the geotagged tweets that contain keywords or hashtags related COVID-19. With emerging COVID variants but ongoing vaccine hesitancy and resistance, this dataset could be used by researchers to study numerous aspects of COVID-19.
arXiv Detail & Related papers (2021-08-13T20:21:29Z)
Retiring Adult: New Datasets for Fair Machine Learning [47.27417042497261]
UCI Adult has served as the basis for the development and comparison of many algorithmic fairness interventions. We reconstruct a superset of the UCI Adult data from available US Census sources and reveal idiosyncrasies of the UCI Adult dataset that limit its external validity. Our primary contribution is a suite of new datasets that extend the existing data ecosystem for research on fair machine learning.
arXiv Detail & Related papers (2021-08-10T19:19:41Z)
Rapidly Bootstrapping a Question Answering Dataset for COVID-19 [88.86456834766288]
We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19. This is the first publicly available resource of its type, and intended as a stopgap measure for guiding research until more substantial evaluation resources become available.
arXiv Detail & Related papers (2020-04-23T17:35:11Z)
NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization [101.13851473792334]
We construct a large-scale congested crowd counting and localization dataset, NWPU-Crowd, consisting of 5,109 images, in a total of 2,133,375 annotated heads with points and boxes. Compared with other real-world datasets, it contains various illumination scenes and has the largest density range (020,033) We describe the data characteristics, evaluate the performance of some mainstream state-of-the-art (SOTA) methods, and analyze the new problems that arise on the new data.
arXiv Detail & Related papers (2020-01-10T09:26:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.