Measuring Large Language Models Capacity to Annotate Journalistic Sourcing
- URL: http://arxiv.org/abs/2501.00164v1
- Date: Mon, 30 Dec 2024 22:15:57 GMT
- Title: Measuring Large Language Models Capacity to Annotate Journalistic Sourcing
- Authors: Subramaniam Vincent, Phoebe Wang, Zhan Shi, Sahas Koka, Yi Fang,
- Abstract summary: This paper lays out a scenario to evaluate Large Language Models on identifying and annotating sourcing in news stories.
Our accuracy findings indicate LLM-based approaches have more catching to do in identifying all the sourced statements in a story, and equally, in matching the type of sources.
- Score: 11.22185665245128
- License:
- Abstract: Since the launch of ChatGPT in late 2022, the capacities of Large Language Models and their evaluation have been in constant discussion and evaluation both in academic research and in the industry. Scenarios and benchmarks have been developed in several areas such as law, medicine and math (Bommasani et al., 2023) and there is continuous evaluation of model variants. One area that has not received sufficient scenario development attention is journalism, and in particular journalistic sourcing and ethics. Journalism is a crucial truth-determination function in democracy (Vincent, 2023), and sourcing is a crucial pillar to all original journalistic output. Evaluating the capacities of LLMs to annotate stories for the different signals of sourcing and how reporters justify them is a crucial scenario that warrants a benchmark approach. It offers potential to build automated systems to contrast more transparent and ethically rigorous forms of journalism with everyday fare. In this paper we lay out a scenario to evaluate LLM performance on identifying and annotating sourcing in news stories on a five-category schema inspired from journalism studies (Gans, 2004). We offer the use case, our dataset and metrics and as the first step towards systematic benchmarking. Our accuracy findings indicate LLM-based approaches have more catching to do in identifying all the sourced statements in a story, and equally, in matching the type of sources. An even harder task is spotting source justifications.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
We propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - Enhancing Journalism with AI: A Study of Contextualized Image Captioning for News Articles using LLMs and LMMs [2.1165011830664673]
Large language models (LLMs) and large multimodal models (LMMs) have significantly impacted the AI community.
This study explores how LLMs and LMMs can assist journalistic practice by generating contextualised captions for images accompanying news articles.
arXiv Detail & Related papers (2024-08-08T09:31:24Z) - Developing Story: Case Studies of Generative AI's Use in Journalism [18.67676679963561]
We conduct a study of journalist-AI interactions by two news agencies through browsing the WildChat dataset.
Our analysis uncovers instances where journalists provide sensitive material such as confidential correspondence with sources or articles from other agencies to the LLM as stimuli and prompt it to generate articles.
Based on our findings, we call for further research into what constitutes responsible use of AI, and the establishment of clear guidelines and best practices on using LLMs in a journalistic context.
arXiv Detail & Related papers (2024-06-19T16:58:32Z) - NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Journalism [28.443004656952343]
We present NewsBench, a novel evaluation framework to systematically assess the capabilities of Large Language Models (LLMs) for editorial capabilities in Chinese journalism.
Our constructed benchmark dataset is focused on four facets of writing proficiency and six facets of safety adherence.
We propose different GPT-4 based automatic evaluation protocols to assess LLM generations for short answer questions in terms of writing proficiency and safety adherence.
arXiv Detail & Related papers (2024-02-29T21:05:14Z) - Maintaining Journalistic Integrity in the Digital Age: A Comprehensive
NLP Framework for Evaluating Online News Content [0.0]
This paper proposes a comprehensive framework to analyze online news texts using natural language processing (NLP) techniques.
The framework incorporates ten journalism standards-objectivity, balance and fairness, readability and clarity, sensationalism and clickbait, ethical considerations, public interest and value, source credibility, relevance and timeliness, factual accuracy, and attribution and transparency.
arXiv Detail & Related papers (2024-01-07T12:27:14Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - Identifying Informational Sources in News Articles [109.70475599552523]
We build the largest and widest-ranging annotated dataset of informational sources used in news writing.
We introduce a novel task, source prediction, to study the compositionality of sources in news articles.
arXiv Detail & Related papers (2023-05-24T08:56:35Z) - Framing the News:From Human Perception to Large Language Model
Inferences [8.666172545138272]
Identifying the frames of news is important to understand the articles' vision, intention, message to be conveyed, and which aspects of the news are emphasized.
We develop a protocol for human labeling of frames for 1786 headlines of No-Vax movement articles of European newspapers from 5 countries.
We investigate two approaches for frame inference of news headlines: first with a GPT-3.5 fine-tuning approach, and second with GPT-3.5 prompt-engineering.
arXiv Detail & Related papers (2023-04-27T18:30:18Z) - Towards Corpus-Scale Discovery of Selection Biases in News Coverage:
Comparing What Sources Say About Entities as a Start [65.28355014154549]
This paper investigates the challenges of building scalable NLP systems for discovering patterns of media selection biases directly from news content in massive-scale news corpora.
We show the capabilities of the framework through a case study on NELA-2020, a corpus of 1.8M news articles in English from 519 news sources worldwide.
arXiv Detail & Related papers (2023-04-06T23:36:45Z) - Holistic Evaluation of Language Models [183.94891340168175]
Language models (LMs) are becoming the foundation for almost all major language technologies, but their capabilities, limitations, and risks are not well understood.
We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models.
arXiv Detail & Related papers (2022-11-16T18:51:34Z) - "Don't quote me on that": Finding Mixtures of Sources in News Articles [85.92467549469147]
We construct an ontological labeling system for sources based on each source's textitaffiliation and textitrole
We build a probabilistic model to infer these attributes for named sources and to describe news articles as mixtures of these sources.
arXiv Detail & Related papers (2021-04-19T21:57:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.