The COVID That Wasn't: Counterfactual Journalism Using GPT
- URL: http://arxiv.org/abs/2210.06644v1
- Date: Thu, 13 Oct 2022 00:50:24 GMT
- Title: The COVID That Wasn't: Counterfactual Journalism Using GPT
- Authors: Sil Hamilton, Andrew Piper
- Abstract summary: We use a language model trained prior to 2020 to artificially generate news articles concerning COVID-19.
We then compare stylistic qualities of our artificially generated corpus with a news corpus.
We find our artificially generated articles exhibits a considerably more negative attitude towards COVID and a significantly lower reliance on geopolitical framing.
- Score: 0.5076419064097734
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we explore the use of large language models to assess human
interpretations of real world events. To do so, we use a language model trained
prior to 2020 to artificially generate news articles concerning COVID-19 given
the headlines of actual articles written during the pandemic. We then compare
stylistic qualities of our artificially generated corpus with a news corpus, in
this case 5,082 articles produced by CBC News between January 23 and May 5,
2020. We find our artificially generated articles exhibits a considerably more
negative attitude towards COVID and a significantly lower reliance on
geopolitical framing. Our methods and results hold importance for researchers
seeking to simulate large scale cultural processes via recent breakthroughs in
text generation.
Related papers
- SciNews: From Scholarly Complexities to Public Narratives -- A Dataset for Scientific News Report Generation [20.994565065595232]
We present a new corpus to facilitate the automated generation of scientific news reports.
Our dataset comprises academic publications and their corresponding scientific news reports across nine disciplines.
We benchmark our dataset employing state-of-the-art text generation models.
arXiv Detail & Related papers (2024-03-26T14:54:48Z) - Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news.
Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z) - Framing the News:From Human Perception to Large Language Model
Inferences [8.666172545138272]
Identifying the frames of news is important to understand the articles' vision, intention, message to be conveyed, and which aspects of the news are emphasized.
We develop a protocol for human labeling of frames for 1786 headlines of No-Vax movement articles of European newspapers from 5 countries.
We investigate two approaches for frame inference of news headlines: first with a GPT-3.5 fine-tuning approach, and second with GPT-3.5 prompt-engineering.
arXiv Detail & Related papers (2023-04-27T18:30:18Z) - Text2Time: Transformer-based Article Time Period Prediction [0.11470070927586018]
This work investigates the problem of predicting the publication period of a text document, specifically a news article, based on its textual content.
We create our own extensive labeled dataset of over 350,000 news articles published by The New York Times over six decades.
In our approach, we use a pretrained BERT model fine-tuned for the task of text classification, specifically for time period prediction.
arXiv Detail & Related papers (2023-04-21T10:05:03Z) - Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case
Study using Latent Dirichlet Allocation Method [8.405827390095064]
Topic Modelling (TM) is from the research branches of natural language understanding (NLU) and natural language processing (NLP)
In this study, we apply popular Latent Dirichlet Allocation (LDA) methods to model the topic changes in Swedish newspaper articles about Coronavirus.
We describe the corpus we created including 6515 articles, methods applied, and statistics on topic changes over approximately 1 year and two months period of time from 17th January 2020 to 13th March 2021.
arXiv Detail & Related papers (2023-01-08T12:33:58Z) - NewsEdits: A News Article Revision Dataset and a Document-Level
Reasoning Challenge [122.37011526554403]
NewsEdits is the first publicly available dataset of news revision histories.
It contains 1.2 million articles with 4.6 million versions from over 22 English- and French-language newspaper sources.
arXiv Detail & Related papers (2022-06-14T18:47:13Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - No News is Good News: A Critique of the One Billion Word Benchmark [4.396860522241306]
The One Billion Word Benchmark is a dataset derived from the WMT 2011 News Crawl.
We train models solely on Common Crawl web scrapes partitioned by year, and demonstrate that they perform worse on this task over time due to distributional shift.
arXiv Detail & Related papers (2021-10-25T02:41:27Z) - Deep Learning for Text Style Transfer: A Survey [71.8870854396927]
Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text.
We present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017.
We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data.
arXiv Detail & Related papers (2020-11-01T04:04:43Z) - Viable Threat on News Reading: Generating Biased News Using Natural
Language Models [49.90665530780664]
We show that publicly available language models can reliably generate biased news content based on an input original news.
We also show that a large number of high-quality biased news articles can be generated using controllable text generation.
arXiv Detail & Related papers (2020-10-05T16:55:39Z) - Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions.
To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles.
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.