Explaining Mixtures of Sources in News Articles
- URL: http://arxiv.org/abs/2411.05192v1
- Date: Thu, 07 Nov 2024 21:34:05 GMT
- Title: Explaining Mixtures of Sources in News Articles
- Authors: Alexander Spangher, James Youn, Matt DeButts, Nanyun Peng, Emilio Ferrara, Jonathan May,
- Abstract summary: We explore one kind of planning, source-selection in news, as a case-study for evaluating plans in long-form generation.
We see this as an important case-study for human planning, and provides a framework and approach for evaluating other kinds of plans.
- Score: 102.00778025859276
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human writers plan, then write. For large language models (LLMs) to play a role in longer-form article generation, we must understand the planning steps humans make before writing. We explore one kind of planning, source-selection in news, as a case-study for evaluating plans in long-form generation. We ask: why do specific stories call for specific kinds of sources? We imagine a generative process for story writing where a source-selection schema is first selected by a journalist, and then sources are chosen based on categories in that schema. Learning the article's plan means predicting the schema initially chosen by the journalist. Working with professional journalists, we adapt five existing schemata and introduce three new ones to describe journalistic plans for the inclusion of sources in documents. Then, inspired by Bayesian latent-variable modeling, we develop metrics to select the most likely plan, or schema, underlying a story, which we use to compare schemata. We find that two schemata: stance and social affiliation best explain source plans in most documents. However, other schemata like textual entailment explain source plans in factually rich topics like "Science". Finally, we find we can predict the most suitable schema given just the article's headline with reasonable accuracy. We see this as an important case-study for human planning, and provides a framework and approach for evaluating other kinds of plans. We release a corpora, NewsSources, with annotations for 4M articles.
Related papers
- Measuring Large Language Models Capacity to Annotate Journalistic Sourcing [11.22185665245128]
This paper lays out a scenario to evaluate Large Language Models on identifying and annotating sourcing in news stories.
Our accuracy findings indicate LLM-based approaches have more catching to do in identifying all the sourced statements in a story, and equally, in matching the type of sources.
arXiv Detail & Related papers (2024-12-30T22:15:57Z) - EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form
Narrative Text Generation [114.50719922069261]
We propose a new framework called Evaluation-guided Iterative Plan Extraction for long-form narrative text generation (EIPE-text)
EIPE-text has three stages: plan extraction, learning, and inference.
We evaluate the effectiveness of EIPE-text in the domains of novels and storytelling.
arXiv Detail & Related papers (2023-10-12T10:21:37Z) - Identifying Informational Sources in News Articles [109.70475599552523]
We build the largest and widest-ranging annotated dataset of informational sources used in news writing.
We introduce a novel task, source prediction, to study the compositionality of sources in news articles.
arXiv Detail & Related papers (2023-05-24T08:56:35Z) - Towards Corpus-Scale Discovery of Selection Biases in News Coverage:
Comparing What Sources Say About Entities as a Start [65.28355014154549]
This paper investigates the challenges of building scalable NLP systems for discovering patterns of media selection biases directly from news content in massive-scale news corpora.
We show the capabilities of the framework through a case study on NELA-2020, a corpus of 1.8M news articles in English from 519 news sources worldwide.
arXiv Detail & Related papers (2023-04-06T23:36:45Z) - Little Red Riding Hood Goes Around the Globe:Crosslingual Story Planning and Generation with Large Language Models [69.60579227637399]
Previous work has demonstrated the effectiveness of planning for story generation exclusively in a monolingual setting focusing primarily on English.
We propose a new task of cross-lingual story generation with planning and present a new dataset for this task.
arXiv Detail & Related papers (2022-12-20T17:42:16Z) - "Don't quote me on that": Finding Mixtures of Sources in News Articles [85.92467549469147]
We construct an ontological labeling system for sources based on each source's textitaffiliation and textitrole
We build a probabilistic model to infer these attributes for named sources and to describe news articles as mixtures of these sources.
arXiv Detail & Related papers (2021-04-19T21:57:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.