"Don't quote me on that": Finding Mixtures of Sources in News Articles
- URL: http://arxiv.org/abs/2104.09656v1
- Date: Mon, 19 Apr 2021 21:57:11 GMT
- Title: "Don't quote me on that": Finding Mixtures of Sources in News Articles
- Authors: Alexander Spangher, Nanyun Peng, Jonathan May and Emilio Ferrara
- Abstract summary: We construct an ontological labeling system for sources based on each source's textitaffiliation and textitrole
We build a probabilistic model to infer these attributes for named sources and to describe news articles as mixtures of these sources.
- Score: 85.92467549469147
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Journalists publish statements provided by people, or \textit{sources} to
contextualize current events, help voters make informed decisions, and hold
powerful individuals accountable. In this work, we construct an ontological
labeling system for sources based on each source's \textit{affiliation} and
\textit{role}. We build a probabilistic model to infer these attributes for
named sources and to describe news articles as mixtures of these sources. Our
model outperforms existing mixture modeling and co-clustering approaches and
correctly infers source-type in 80\% of expert-evaluated trials. Such work can
facilitate research in downstream tasks like opinion and argumentation mining,
representing a first step towards machine-in-the-loop \textit{computational
journalism} systems.
Related papers
- Citations as Queries: Source Attribution Using Language Models as
Rerankers [2.3605348648054454]
We conduct experiments on two datasets, English Wikipedia and medieval Arabic historical writing.
We find that semisupervised methods can be nearly as effective as fully supervised methods.
arXiv Detail & Related papers (2023-06-29T22:13:38Z) - Identifying Informational Sources in News Articles [109.70475599552523]
We build the largest and widest-ranging annotated dataset of informational sources used in news writing.
We introduce a novel task, source prediction, to study the compositionality of sources in news articles.
arXiv Detail & Related papers (2023-05-24T08:56:35Z) - Towards Corpus-Scale Discovery of Selection Biases in News Coverage:
Comparing What Sources Say About Entities as a Start [65.28355014154549]
This paper investigates the challenges of building scalable NLP systems for discovering patterns of media selection biases directly from news content in massive-scale news corpora.
We show the capabilities of the framework through a case study on NELA-2020, a corpus of 1.8M news articles in English from 519 news sources worldwide.
arXiv Detail & Related papers (2023-04-06T23:36:45Z) - Multi-Source Diffusion Models for Simultaneous Music Generation and Separation [17.124189082882395]
We train our model on Slakh2100, a standard dataset for musical source separation.
Our method is the first example of a single model that can handle both generation and separation tasks.
arXiv Detail & Related papers (2023-02-04T23:18:36Z) - Discord Questions: A Computational Approach To Diversity Analysis in
News Coverage [84.55145223950427]
We propose a new framework to assist readers in identifying source differences and gaining an understanding of news coverage diversity.
The framework is based on the generation of Discord Questions: questions with a diverse answer pool.
arXiv Detail & Related papers (2022-11-09T16:37:55Z) - SciLander: Mapping the Scientific News Landscape [8.504643390943409]
We introduce SciLander, a method for learning representations of news sources reporting on science-based topics.
We evaluate our method on a novel COVID-19 dataset containing nearly 1M news articles from 500 sources spanning a period of 18 months since the beginning of the pandemic in 2020.
arXiv Detail & Related papers (2022-05-16T20:20:43Z) - Hidden Biases in Unreliable News Detection Datasets [60.71991809782698]
We show that selection bias during data collection leads to undesired artifacts in the datasets.
We observed a significant drop (>10%) in accuracy for all models tested in a clean split with no train/test source overlap.
We suggest future dataset creation include a simple model as a difficulty/bias probe and future model development use a clean non-overlapping site and date split.
arXiv Detail & Related papers (2021-04-20T17:16:41Z) - Pretrained Language Models for Dialogue Generation with Multiple Input
Sources [101.17537614998805]
In this work, we study dialogue models with multiple input sources adapted from the pretrained language model GPT2.
We explore various methods to fuse multiple separate attention information corresponding to different sources.
Our experimental results show that proper fusion methods deliver higher relevance with dialogue history than simple fusion baselines.
arXiv Detail & Related papers (2020-10-15T07:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.