Machine-Made Media: Monitoring the Mobilization of Machine-Generated Articles on Misinformation and Mainstream News Websites
- URL: http://arxiv.org/abs/2305.09820v5
- Date: Wed, 20 Mar 2024 03:58:34 GMT
- Title: Machine-Made Media: Monitoring the Mobilization of Machine-Generated Articles on Misinformation and Mainstream News Websites
- Authors: Hans W. A. Hanley, Zakir Durumeric,
- Abstract summary: We train a DeBERTa-based synthetic news detector and classify over 15.46 million articles from 3,074 misinformation and mainstream news websites.
We find that between January 1, 2022, and May 1, 2023, the relative number of synthetic news articles increased by 57.3% on mainstream websites while increasing by 474% on misinformation sites.
- Score: 5.161088104035108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As large language models (LLMs) like ChatGPT have gained traction, an increasing number of news websites have begun utilizing them to generate articles. However, not only can these language models produce factually inaccurate articles on reputable websites but disreputable news sites can utilize LLMs to mass produce misinformation. To begin to understand this phenomenon, we present one of the first large-scale studies of the prevalence of synthetic articles within online news media. To do this, we train a DeBERTa-based synthetic news detector and classify over 15.46 million articles from 3,074 misinformation and mainstream news websites. We find that between January 1, 2022, and May 1, 2023, the relative number of synthetic news articles increased by 57.3% on mainstream websites while increasing by 474% on misinformation sites. We find that this increase is largely driven by smaller less popular websites. Analyzing the impact of the release of ChatGPT using an interrupted-time-series, we show that while its release resulted in a marked increase in synthetic articles on small sites as well as misinformation news websites, there was not a corresponding increase on large mainstream news websites.
Related papers
- 3DLNews: A Three-decade Dataset of US Local News Articles [49.1574468325115]
3DLNews is a novel dataset with local news articles from the United States spanning the period from 1996 to 2024.
It contains almost 1 million URLs (with HTML text) from over 14,000 local newspapers, TV, and radio stations across all 50 states.
arXiv Detail & Related papers (2024-08-08T18:33:37Z) - Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news.
Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z) - Specious Sites: Tracking the Spread and Sway of Spurious News Stories at
Scale [6.917588580148212]
We identify 52,036 narratives on 1,334 unreliable news websites.
We show how our system can be utilized to detect new narratives originating from unreliable news websites.
arXiv Detail & Related papers (2023-08-03T22:42:30Z) - FALSE: Fake News Automatic and Lightweight Solution [0.20999222360659603]
In this paper, R code have been used to study and visualize a modern fake news dataset.
We use clustering, classification, correlation and various plots to analyze and present the data.
arXiv Detail & Related papers (2022-08-16T11:53:30Z) - Misinfo Belief Frames: A Case Study on Covid & Climate News [49.979419711713795]
We propose a formalism for understanding how readers perceive the reliability of news and the impact of misinformation.
We introduce the Misinfo Belief Frames (MBF) corpus, a dataset of 66k inferences over 23.5k headlines.
Our results using large-scale language modeling to predict misinformation frames show that machine-generated inferences can influence readers' trust in news headlines.
arXiv Detail & Related papers (2021-04-18T09:50:11Z) - The Rise and Fall of Fake News sites: A Traffic Analysis [62.51737815926007]
We investigate the online presence of fake news websites and characterize their behavior in comparison to real news websites.
Based on our findings, we build a content-agnostic ML for automatic detection of fake news websites.
arXiv Detail & Related papers (2021-03-16T18:10:22Z) - BaitWatcher: A lightweight web interface for the detection of
incongruent news headlines [27.29585619643952]
BaitWatcher is a lightweight web interface that guides readers in estimating the likelihood of incongruence in news articles before clicking on the headlines.
BaiittWatcher utilizes a hierarchical recurrent encoder that efficiently learns complex textual representations of a news headline and its associated body text.
arXiv Detail & Related papers (2020-03-23T23:43:02Z) - 365 Dots in 2019: Quantifying Attention of News Sources [69.50862982117125]
We measure the overlap of topics of online news articles from a variety of sources.
We score news stories according to the degree of attention in near-real time.
This can enable multiple studies, including identifying topics that receive the most attention.
arXiv Detail & Related papers (2020-03-22T20:32:47Z) - HoaxItaly: a collection of Italian disinformation and fact-checking
stories shared on Twitter in 2019 [72.96986027203377]
The dataset includes also title and body for approximately 37k news articles.
It is publicly available at https://doi.org/10.79DVN/ PGVDHX.
arXiv Detail & Related papers (2020-01-29T16:14:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.