NELA-Local: A Dataset of U.S. Local News Articles for the Study of
County-level News Ecosystems
- URL: http://arxiv.org/abs/2203.08600v1
- Date: Wed, 16 Mar 2022 13:19:21 GMT
- Title: NELA-Local: A Dataset of U.S. Local News Articles for the Study of
County-level News Ecosystems
- Authors: Benjamin D. Horne, Maur\'icio Gruppi, Kenneth Joseph, Jon Green, John
P. Wihbey, and Sibel Adal{\i}
- Abstract summary: We present a dataset of over 1.4M online news articles from 313 local U.S. outlets.
These outlets cover a geographically diverse set of communities across the United States.
- Score: 4.977804197346136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a dataset of over 1.4M online news articles from
313 local U.S. news outlets published over 20 months (between April 4th, 2020
and December 31st, 2021). These outlets cover a geographically diverse set of
communities across the United States. In order to estimate characteristics of
the local audience, included with this news article data is a wide range of
county-level metadata, including demographics, 2020 Presidential Election vote
shares, and community resilience estimates from the U.S. Census Bureau. The
NELA-Local dataset can be found at:
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/GFE66K.
Related papers
- 3DLNews: A Three-decade Dataset of US Local News Articles [49.1574468325115]
3DLNews is a novel dataset with local news articles from the United States spanning the period from 1996 to 2024.
It contains almost 1 million URLs (with HTML text) from over 14,000 local newspapers, TV, and radio stations across all 50 states.
arXiv Detail & Related papers (2024-08-08T18:33:37Z) - Analyzing the Impact of Fake News on the Anticipated Outcome of the 2024
Election Ahead of Time [7.1970442944315245]
Despite increasing awareness and research around fake news, there is still a significant need for datasets that specifically target racial slurs and biases within North American political speeches.
This study introduces a comprehensive dataset that illuminates these critical aspects of misinformation.
arXiv Detail & Related papers (2023-12-01T20:14:16Z) - Design and analysis of tweet-based election models for the 2021 Mexican
legislative election [55.41644538483948]
We use a dataset of 15 million election-related tweets in the six months preceding election day.
We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods.
arXiv Detail & Related papers (2023-01-02T12:40:05Z) - Geographic Citation Gaps in NLP Research [63.13508571014673]
This work asks a series of questions on the relationship between geographical location and publication success.
We first created a dataset of 70,000 papers from the ACL Anthology, extracted their meta-information, and generated their citation network.
We show that not only are there substantial geographical disparities in paper acceptance and citation but also that these disparities persist even when controlling for a number of variables such as venue of publication and sub-field of NLP.
arXiv Detail & Related papers (2022-10-26T02:25:23Z) - News Category Dataset [1.7513645771137178]
We present a News Category dataset that contains around 200k news headlines from the year 2012 to 2018 obtained from HuffPost.
In this paper, we produce some novel insights from the dataset and describe various existing and potential applications of our dataset.
arXiv Detail & Related papers (2022-09-23T06:13:16Z) - Datasets: A Community Library for Natural Language Processing [55.48866401721244]
datasets is a community library for contemporary NLP.
The library includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects.
arXiv Detail & Related papers (2021-09-07T03:59:22Z) - Counting Protests in News Articles: A Dataset and Semi-Automated Data
Collection Pipeline [0.0]
Between January 2017 and January 2021, thousands of local news sources in the United States reported on over 42,000 protests about topics such as civil rights, immigration, guns, and the environment.
We release a dataset of news article URLs, dates, locations, crowd size estimates, and 494 discrete descriptive tags corresponding to 42,347 reported protest events in the United States between January 2017 and January 2021.
arXiv Detail & Related papers (2021-02-01T15:35:21Z) - CovidNet: To Bring Data Transparency in the Era of COVID-19 [9.808021836153712]
This paper presents CovidNet, a COVID-19 tracking project associated with a large scale epidemic dataset.
CovidNet is the only platform providing real-time global case information of more than 4,124 sub-divisions from over 27 countries worldwide.
The accuracy and freshness of the dataset is a result of the painstaking efforts from our voluntary teamwork, crowd-sourcing channels, and automated data pipelines.
arXiv Detail & Related papers (2020-05-22T00:05:17Z) - 365 Dots in 2019: Quantifying Attention of News Sources [69.50862982117125]
We measure the overlap of topics of online news articles from a variety of sources.
We score news stories according to the degree of attention in near-real time.
This can enable multiple studies, including identifying topics that receive the most attention.
arXiv Detail & Related papers (2020-03-22T20:32:47Z) - NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization [101.13851473792334]
We construct a large-scale congested crowd counting and localization dataset, NWPU-Crowd, consisting of 5,109 images, in a total of 2,133,375 annotated heads with points and boxes.
Compared with other real-world datasets, it contains various illumination scenes and has the largest density range (020,033)
We describe the data characteristics, evaluate the performance of some mainstream state-of-the-art (SOTA) methods, and analyze the new problems that arise on the new data.
arXiv Detail & Related papers (2020-01-10T09:26:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.