NewsEdits 2.0: Learning the Intentions Behind Updating News
- URL: http://arxiv.org/abs/2411.18811v1
- Date: Wed, 27 Nov 2024 23:35:23 GMT
- Title: NewsEdits 2.0: Learning the Intentions Behind Updating News
- Authors: Alexander Spangher, Kung-Hsiang Huang, Hyundong Cho, Jonathan May,
- Abstract summary: As events progress, news articles often update with new information: if we are not cautious, we risk propagating outdated facts.
In this work, we hypothesize that linguistic features indicate factual fluidity, and that we can predict which facts in a news article will update using solely the text of a news article.
- Score: 74.84017890548259
- License:
- Abstract: As events progress, news articles often update with new information: if we are not cautious, we risk propagating outdated facts. In this work, we hypothesize that linguistic features indicate factual fluidity, and that we can predict which facts in a news article will update using solely the text of a news article (i.e. not external resources like search engines). We test this hypothesis, first, by isolating fact-updates in large news revisions corpora. News articles may update for many reasons (e.g. factual, stylistic, narrative). We introduce the NewsEdits 2.0 taxonomy, an edit-intentions schema that separates fact updates from stylistic and narrative updates in news writing. We annotate over 9,200 pairs of sentence revisions and train high-scoring ensemble models to apply this schema. Then, taking a large dataset of silver-labeled pairs, we show that we can predict when facts will update in older article drafts with high precision. Finally, to demonstrate the usefulness of these findings, we construct a language model question asking (LLM-QA) abstention task. We wish the LLM to abstain from answering questions when information is likely to become outdated. Using our predictions, we show, LLM absention reaches near oracle levels of accuracy.
Related papers
- SCStory: Self-supervised and Continual Online Story Discovery [53.72745249384159]
SCStory helps people digest rapidly published news article streams in real-time without human annotations.
SCStory employs self-supervised and continual learning with a novel idea of story-indicative adaptive modeling of news article streams.
arXiv Detail & Related papers (2023-11-27T04:50:01Z) - WSDMS: Debunk Fake News via Weakly Supervised Detection of Misinforming
Sentences with Contextualized Social Wisdom [13.92421433941043]
We investigate a novel task in the field of fake news debunking, which involves detecting sentence-level misinformation.
Inspired by the Multiple Instance Learning (MIL) approach, we propose a model called Weakly Supervised Detection of Misinforming Sentences (WSDMS)
We evaluate WSDMS on three real-world benchmarks and demonstrate that it outperforms existing state-of-the-art baselines in debunking fake news at both the sentence and article levels.
arXiv Detail & Related papers (2023-10-25T12:06:55Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - NewsEdits: A News Article Revision Dataset and a Document-Level
Reasoning Challenge [122.37011526554403]
NewsEdits is the first publicly available dataset of news revision histories.
It contains 1.2 million articles with 4.6 million versions from over 22 English- and French-language newspaper sources.
arXiv Detail & Related papers (2022-06-14T18:47:13Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - No News is Good News: A Critique of the One Billion Word Benchmark [4.396860522241306]
The One Billion Word Benchmark is a dataset derived from the WMT 2011 News Crawl.
We train models solely on Common Crawl web scrapes partitioned by year, and demonstrate that they perform worse on this task over time due to distributional shift.
arXiv Detail & Related papers (2021-10-25T02:41:27Z) - Explainable Tsetlin Machine framework for fake news detection with
credibility score assessment [16.457778420360537]
We propose a novel interpretable fake news detection framework based on the recently introduced Tsetlin Machine (TM)
We use the conjunctive clauses of the TM to capture lexical and semantic properties of both true and fake news text.
For evaluation, we conduct experiments on two publicly available datasets, PolitiFact and GossipCop, and demonstrate that the TM framework significantly outperforms previously published baselines by at least $5%$ in terms of accuracy.
arXiv Detail & Related papers (2021-05-19T13:18:02Z) - Supporting verification of news articles with automated search for
semantically similar articles [0.0]
We propose an evidence retrieval approach to handle fake news.
The learning task is formulated as an unsupervised machine learning problem.
We find that our approach is agnostic to concept drifts, i.e. the machine learning task is independent of the hypotheses in a text.
arXiv Detail & Related papers (2021-03-29T12:56:59Z) - NewsBERT: Distilling Pre-trained Language Model for Intelligent News
Application [56.1830016521422]
We propose NewsBERT, which can distill pre-trained language models for efficient and effective news intelligence.
In our approach, we design a teacher-student joint learning and distillation framework to collaboratively learn both teacher and student models.
In our experiments, NewsBERT can effectively improve the model performance in various intelligent news applications with much smaller models.
arXiv Detail & Related papers (2021-02-09T15:41:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.