Related papers: NEWSAGENT: Benchmarking Multimodal Agents as Journalists with Real-World Newswriting Tasks

NEWSAGENT: Benchmarking Multimodal Agents as Journalists with Real-World Newswriting Tasks

URL: http://arxiv.org/abs/2509.00446v1
Date: Sat, 30 Aug 2025 10:31:34 GMT
Title: NEWSAGENT: Benchmarking Multimodal Agents as Journalists with Real-World Newswriting Tasks
Authors: Yen-Che Chien, Kuang-Da Wang, Wei-Yao Wang, Wen-Chih Peng,
Abstract summary: NEWSAGENT is a benchmark for evaluating how agents can automatically search available raw contents, select desired information, and edit and rephrase to form a news article.<n> NEWSAGENT includes 6k human-verified examples derived from real news, with multimodal contents converted to text for broad model compatibility.<n>We believe NEWSAGENT serves a realistic testbed for iterating and evaluating agent capabilities in terms of multimodal web data manipulation to real-world productivity.
Score: 21.577527868033343
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in autonomous digital agents from industry (e.g., Manus AI and Gemini's research mode) highlight potential for structured tasks by autonomous decision-making and task decomposition; however, it remains unclear to what extent the agent-based systems can improve multimodal web data productivity. We study this in the realm of journalism, which requires iterative planning, interpretation, and contextual reasoning from multimodal raw contents to form a well structured news. We introduce NEWSAGENT, a benchmark for evaluating how agents can automatically search available raw contents, select desired information, and edit and rephrase to form a news article by accessing core journalistic functions. Given a writing instruction and firsthand data as how a journalist initiates a news draft, agents are tasked to identify narrative perspectives, issue keyword-based queries, retrieve historical background, and generate complete articles. Unlike typical summarization or retrieval tasks, essential context is not directly available and must be actively discovered, reflecting the information gaps faced in real-world news writing. NEWSAGENT includes 6k human-verified examples derived from real news, with multimodal contents converted to text for broad model compatibility. We evaluate open- and closed-sourced LLMs with commonly-used agentic frameworks on NEWSAGENT, which shows that agents are capable of retrieving relevant facts but struggling with planning and narrative integration. We believe that NEWSAGENT serves a realistic testbed for iterating and evaluating agent capabilities in terms of multimodal web data manipulation to real-world productivity.

Related papers

AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research [85.51475655916026]
AgentCPM-Report is a lightweight yet high-performing local solution composed of a framework that mirrors the human writing process.<n>Our framework uses a Writing As Reasoning Policy (WARP), which enables models to dynamically revise outlines.<n>Experiments on DeepResearch Bench, DeepConsult, and DeepResearch Gym demonstrate that AgentCPM-Report outperforms leading closed-source systems.
arXiv Detail & Related papers (2026-02-06T09:45:04Z)
ZoFia: Zero-Shot Fake News Detection with Entity-Guided Retrieval and Multi-LLM Interaction [14.012874564599272]
ZoFia is a novel two-stage zero-shot fake news detection framework.<n>First, we introduce Hierarchical Salience to quantify the importance of entities in the news content.<n>We then propose the SC-MMR algorithm to effectively select an informative and diverse set of keywords.
arXiv Detail & Related papers (2025-11-03T03:29:42Z)
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents [72.28593628378991]
WebResearcher is an iterative deep-research paradigm that reformulates deep research as a Markov Decision Process.<n>WebResearcher achieves state-of-the-art performance, even surpassing frontier proprietary systems.
arXiv Detail & Related papers (2025-09-16T17:57:17Z)
How can AI agents support journalists' work? An experiment with designing an LLM-driven intelligent reporting system [0.0]
The integration of artificial intelligence into journalistic practices represents a transformative shift in how news is gathered, analyzed, and disseminated.<n>Large language models (LLMs), particularly those with agentic capabilities, offer unprecedented opportunities for enhancing journalistic practices.<n>This research explores how agentic LLMs can support journalists' filtering, based on insights from journalist interviews.
arXiv Detail & Related papers (2025-08-25T14:56:59Z)
From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.65646344634524]
Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research.<n>We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn.<n>We demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.
arXiv Detail & Related papers (2025-06-23T17:27:19Z)
A Python Tool for Reconstructing Full News Text from GDELT [0.0]
This paper presents a novel approach to obtaining full-text newspaper articles at near-zero cost.<n>We focus on the GDELT Web News NGrams 3.0 dataset, which provides high-frequency updates of n-grams extracted from global online news sources.<n>We provide Python code to reconstruct full-text articles from these n-grams by identifying overlapping textual fragments and intelligently merging them.
arXiv Detail & Related papers (2025-04-22T17:40:42Z)
Measuring Large Language Models Capacity to Annotate Journalistic Sourcing [11.22185665245128]
This paper lays out a scenario to evaluate Large Language Models on identifying and annotating sourcing in news stories.<n>Our accuracy findings indicate LLM-based approaches have more catching to do in identifying all the sourced statements in a story, and equally, in matching the type of sources.
arXiv Detail & Related papers (2024-12-30T22:15:57Z)
Online Digital Investigative Journalism using SociaLens [0.0]
We introduce a versatile and autonomous investigative journalism tool, called em SociaLens, for identifying and extracting query specific data from online sources. We envision its use in investigative journalism, law enforcement and social policy planning. We illustrate the functionality of SociaLens using a focused case study on rape incidents in a developing country.
arXiv Detail & Related papers (2024-10-13T07:20:47Z)
SciNews: From Scholarly Complexities to Public Narratives -- A Dataset for Scientific News Report Generation [16.61347730523143]
We present a new corpus to facilitate the automated generation of scientific news reports.<n>Our dataset comprises academic publications and their corresponding scientific news reports across nine disciplines.<n>We benchmark our dataset employing state-of-the-art text generation models.
arXiv Detail & Related papers (2024-03-26T14:54:48Z)
Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles [136.84278943588652]
We propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event. To facilitate this task, we outlined a data collection schema for identifying diverse information and curated a dataset named DiverseSumm. The dataset includes 245 news stories, with each story comprising 10 news articles and paired with a human-validated reference.
arXiv Detail & Related papers (2023-09-17T20:28:17Z)
Identifying Informational Sources in News Articles [109.70475599552523]
We build the largest and widest-ranging annotated dataset of informational sources used in news writing. We introduce a novel task, source prediction, to study the compositionality of sources in news articles.
arXiv Detail & Related papers (2023-05-24T08:56:35Z)
ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.<n>To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.<n>Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.