Related papers: A Multilingual Similarity Dataset for News Article Frame

A Multilingual Similarity Dataset for News Article Frame

URL: http://arxiv.org/abs/2405.13272v1
Date: Wed, 22 May 2024 01:01:04 GMT
Title: A Multilingual Similarity Dataset for News Article Frame
Authors: Xi Chen, Mattia Samory, Scott Hale, David Jurgens, Przemyslaw A. Grabowicz,
Abstract summary: We introduce an extended version of a large labeled news article dataset with 16,687 new labeled pairs. Our method frees the work of manual identification of frame classes in traditional news frame analysis studies. Overall we introduce the most extensive cross-lingual news article similarity dataset available to date with 26,555 labeled news article pairs across 10 languages.
Score: 14.977682986280998
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding the writing frame of news articles is vital for addressing social issues, and thus has attracted notable attention in the fields of communication studies. Yet, assessing such news article frames remains a challenge due to the absence of a concrete and unified standard dataset that considers the comprehensive nuances within news content. To address this gap, we introduce an extended version of a large labeled news article dataset with 16,687 new labeled pairs. Leveraging the pairwise comparison of news articles, our method frees the work of manual identification of frame classes in traditional news frame analysis studies. Overall we introduce the most extensive cross-lingual news article similarity dataset available to date with 26,555 labeled news article pairs across 10 languages. Each data point has been meticulously annotated according to a codebook detailing eight critical aspects of news content, under a human-in-the-loop framework. Application examples demonstrate its potential in unearthing country communities within global news coverage, exposing media bias among news outlets, and quantifying the factors related to news creation. We envision that this news similarity dataset will broaden our understanding of the media ecosystem in terms of news coverage of events and perspectives across countries, locations, languages, and other social constructs. By doing so, it can catalyze advancements in social science research and applied methodologies, thereby exerting a profound impact on our society.

Related papers

CrossNews-UA: A Cross-lingual News Semantic Similarity Benchmark for Ukrainian, Polish, Russian, and English [53.32175252285023]
Cross-lingual news comparison offers a promising approach to verify information.<n>Existing datasets for cross-lingual news analysis were manually curated by journalists and experts.<n>We introduce a scalable, explainable crowdsourcing pipeline for cross-lingual news similarity assessment.
arXiv Detail & Related papers (2025-10-22T14:23:50Z)
The Media Bias Detector: A Framework for Annotating and Analyzing the News at Scale [24.955234806377643]
We introduce a large, ongoing, near real-time dataset and computational framework to study selection and framing bias in news coverage.<n>Our pipeline integrates large language models with scalable, near-real-time news scraping to extract structured annotations.<n>We quantify these dimensions of coverage at multiple levels -- the sentence level, the article level, and the publisher level.
arXiv Detail & Related papers (2025-09-30T01:41:49Z)
DiscoSum: Discourse-aware News Summarization [79.4884227574627]
We introduce a novel approach to integrating discourse structure into summarization processes.<n>We present a novel summarization dataset where news articles are summarized multiple times in different ways across different social media platforms.<n>We develop a novel news discourse schema to describe summarization structures and a novel algorithm, DiscoSum, which employs beam search technique for structure-aware summarization.
arXiv Detail & Related papers (2025-06-07T22:00:30Z)
Understanding News Creation Intents: Frame, Dataset, and Method [21.22991499250969]
News intent refers to the purpose or intention behind the creation of a news article. We propose News INTent, the first component-aware formalism for understanding the news creation intent based on research in philosophy, psychology, and cognitive science.
arXiv Detail & Related papers (2023-12-27T09:35:23Z)
Tracking the Newsworthiness of Public Documents [107.12303391111014]
This work focuses on news coverage of local public policy in the San Francisco Bay Area by the San Francisco Chronicle. First, we gather news articles, public policy documents and meeting recordings and link them using probabilistic relational modeling. Second, we define a new task: newsworthiness prediction, to predict if a policy item will get covered.
arXiv Detail & Related papers (2023-11-16T10:05:26Z)
An Interactive Framework for Profiling News Media Sources [26.386860411085053]
We propose an interactive framework for news media profiling. It combines the strengths of graph based news media profiling models, Pre-trained Large Language Models, and human insight. With as little as 5 human interactions, our framework can rapidly detect fake and biased news media.
arXiv Detail & Related papers (2023-09-14T02:03:45Z)
Classification of news spreading barriers [3.0036519884678894]
We propose an approach to barrier classification where we infer the semantics of news articles through Wikipedia concepts. We collect news articles and annotated them for different kinds of barriers using the metadata of news publishers. We utilize the Wikipedia concepts along with the body text of news articles as features to infer the news-spreading barriers.
arXiv Detail & Related papers (2023-04-10T20:13:54Z)
Towards Corpus-Scale Discovery of Selection Biases in News Coverage: Comparing What Sources Say About Entities as a Start [65.28355014154549]
This paper investigates the challenges of building scalable NLP systems for discovering patterns of media selection biases directly from news content in massive-scale news corpora. We show the capabilities of the framework through a case study on NELA-2020, a corpus of 1.8M news articles in English from 519 news sources worldwide.
arXiv Detail & Related papers (2023-04-06T23:36:45Z)
Designing and Evaluating Interfaces that Highlight News Coverage Diversity Using Discord Questions [84.55145223950427]
This paper shows that navigating large source collections for a news story can be challenging without further guidance. We design three interfaces -- the Annotated Article, the Recomposed Article, and the Question Grid -- aimed at accompanying news readers in discovering coverage diversity while they read.
arXiv Detail & Related papers (2023-02-17T16:59:31Z)
Unveiling the Hidden Agenda: Biases in News Reporting and Consumption [59.55900146668931]
We build a six-year dataset on the Italian vaccine debate and adopt a Bayesian latent space model to identify narrative and selection biases. We found a nonlinear relationship between biases and engagement, with higher engagement for extreme positions. Analysis of news consumption on Twitter reveals common audiences among news outlets with similar ideological positions.
arXiv Detail & Related papers (2023-01-14T18:58:42Z)
Nothing Stands Alone: Relational Fake News Detection with Hypergraph Neural Networks [49.29141811578359]
We propose to leverage a hypergraph to represent group-wise interaction among news, while focusing on important news relations with its dual-level attention mechanism. Our approach yields remarkable performance and maintains the high performance even with a small subset of labeled news data.
arXiv Detail & Related papers (2022-12-24T00:19:32Z)
Beyond Discrete Genres: Mapping News Items onto a Multidimensional Framework of Genre Cues [0.0]
We propose a non-discrete framework for mapping news items in terms of genre cues. To automatically analyze a large amount of news items, we deliver two computational models for predicting news sentences. This proposed approach helps in deepening our insight into the evolving nature of news genres.
arXiv Detail & Related papers (2022-12-08T10:54:31Z)
Supervised Contrastive Learning for Multimodal Unreliable News Detection in COVID-19 Pandemic [16.43888233012092]
We propose a BERT-based multimodal unreliable news detection framework. It captures both textual and visual information from unreliable articles. We show that our model outperforms a number of competitive baselines in unreliable news detection.
arXiv Detail & Related papers (2021-09-04T11:53:37Z)
BaitWatcher: A lightweight web interface for the detection of incongruent news headlines [27.29585619643952]
BaitWatcher is a lightweight web interface that guides readers in estimating the likelihood of incongruence in news articles before clicking on the headlines. BaiittWatcher utilizes a hierarchical recurrent encoder that efficiently learns complex textual representations of a news headline and its associated body text.
arXiv Detail & Related papers (2020-03-23T23:43:02Z)
365 Dots in 2019: Quantifying Attention of News Sources [69.50862982117125]
We measure the overlap of topics of online news articles from a variety of sources. We score news stories according to the degree of attention in near-real time. This can enable multiple studies, including identifying topics that receive the most attention.
arXiv Detail & Related papers (2020-03-22T20:32:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.