A Longitudinal, Multinational, and Multilingual Corpus of News Coverage of the Russo-Ukrainian War
- URL: http://arxiv.org/abs/2601.16309v1
- Date: Thu, 22 Jan 2026 20:37:42 GMT
- Title: A Longitudinal, Multinational, and Multilingual Corpus of News Coverage of the Russo-Ukrainian War
- Authors: Dikshya Mohanty, Taisiia Sabadyn, Jelwin Rodrigues, Chenlu Wang, Abhishek Kalugade, Ritwik Banerjee,
- Abstract summary: DNIPRO is a novel longitudinal corpus of 246K news articles documenting the Russo-Ukrainian war from Feb 2022 to Aug 2024.<n>It spans eleven media outlets across five nation states (Russia, Ukraine, U.S., U.K., and China) and three languages (English, Russian, and Mandarin Chinese)
- Score: 4.802758600019422
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce DNIPRO, a novel longitudinal corpus of 246K news articles documenting the Russo-Ukrainian war from Feb 2022 to Aug 2024, spanning eleven media outlets across five nation states (Russia, Ukraine, U.S., U.K., and China) and three languages (English, Russian, and Mandarin Chinese). This multilingual resource features consistent and comprehensive metadata, and multiple types of annotation with rigorous human evaluations for downstream tasks relevant to systematic transnational analyses of contentious wartime discourse. DNIPRO's distinctive value lies in its inclusion of competing geopolitical perspectives, making it uniquely suited for studying narrative divergence, media framing, and information warfare. To demonstrate its utility, we include use case experiments using stance detection, sentiment analysis, topical framing, and contradiction analysis of major conflict events within the larger war. Our explorations reveal how outlets construct competing realities, with coverage exhibiting polarized interpretations that reflect geopolitical interests. Beyond supporting computational journalism research, DNIPRO provides a foundational resource for understanding how conflicting narratives emerge and evolve across global information ecosystems.
Related papers
- PartisanLens: A Multilingual Dataset of Hyperpartisan and Conspiratorial Immigration Narratives in European Media [5.72412714580848]
We introduce textscPartisanLens, the first multilingual dataset of num1617 hyperpartisan news headlines in Spanish, Italian, and Portuguese.<n>We evaluate the classification performance of widely used Large Language Models (LLMs) on this dataset, establishing robust baselines for the classification of hyperpartisan and PRCT narratives.<n>At last, we provide our resources and evaluation, textscPartisanLens supports future research on detecting partisan and conspiratorial narratives in European contexts.
arXiv Detail & Related papers (2026-01-07T12:18:14Z) - CrossNews-UA: A Cross-lingual News Semantic Similarity Benchmark for Ukrainian, Polish, Russian, and English [53.32175252285023]
Cross-lingual news comparison offers a promising approach to verify information.<n>Existing datasets for cross-lingual news analysis were manually curated by journalists and experts.<n>We introduce a scalable, explainable crowdsourcing pipeline for cross-lingual news similarity assessment.
arXiv Detail & Related papers (2025-10-22T14:23:50Z) - OSINT or BULLSHINT? Exploring Open-Source Intelligence tweets about the Russo-Ukrainian War [1.0988135174326101]
This paper examines the role of Open Source Intelligence (OSINT) on Twitter regarding the Russo-Ukrainian war.<n>We analyze nearly 2 million tweets from approximately 1,040 users involved in discussing real-time military engagements.<n>We uncover communicative patterns and dissemination strategies within the OSINT community.
arXiv Detail & Related papers (2025-08-05T16:06:36Z) - Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models [52.00270888041742]
We introduce a novel dataset with neutral event descriptions and contrasting viewpoints from different countries.<n>Our findings show significant geopolitical biases, with models favoring specific national narratives.<n>Simple debiasing prompts had a limited effect on reducing these biases.
arXiv Detail & Related papers (2025-06-07T10:45:17Z) - Propaganda and Information Dissemination in the Russo-Ukrainian War: Natural Language Processing of Russian and Western Twitter Narratives [0.0]
This article provides an analysis of tweets from propaganda accounts and trusted accounts collected from the onset of the war.<n>We utilise natural language processing and machine learning algorithms to assess the sentiment and identify key themes.<n>Our findings indicate distinct strategies in how information is created, spread, and targeted at different audiences by both sides.
arXiv Detail & Related papers (2025-06-02T15:52:04Z) - Talking Point based Ideological Discourse Analysis in News Events [62.18747509565779]
We propose a framework motivated by the theory of ideological discourse analysis to analyze news articles related to real-world events.<n>Our framework represents the news articles using a relational structure - talking points, which captures the interaction between entities, their roles, and media frames along with a topic of discussion.<n>We evaluate our framework's ability to generate these perspectives through automated tasks - ideology and partisan classification tasks, supplemented by human validation.
arXiv Detail & Related papers (2025-04-10T02:52:34Z) - Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia [49.80565462746646]
We introduce the InfoGap method -- an efficient and reliable approach to locating information gaps and inconsistencies in articles at the fact level.
We evaluate InfoGap by analyzing LGBT people's portrayals, across 2.7K biography pages on English, Russian, and French Wikipedias.
arXiv Detail & Related papers (2024-10-05T20:40:49Z) - This Land is {Your, My} Land: Evaluating Geopolitical Biases in Language Models [40.61046400448044]
We show that large language models (LLM) recall certain geographical knowledge inconsistently when queried in different languages.
As a targeted case study, we consider territorial disputes, an inherently controversial and multilingual task.
We propose a suite of evaluation metrics to precisely quantify bias and consistency in responses across different languages.
arXiv Detail & Related papers (2023-05-24T01:16:17Z) - Towards Corpus-Scale Discovery of Selection Biases in News Coverage:
Comparing What Sources Say About Entities as a Start [65.28355014154549]
This paper investigates the challenges of building scalable NLP systems for discovering patterns of media selection biases directly from news content in massive-scale news corpora.
We show the capabilities of the framework through a case study on NELA-2020, a corpus of 1.8M news articles in English from 519 news sources worldwide.
arXiv Detail & Related papers (2023-04-06T23:36:45Z) - Bias or Diversity? Unraveling Fine-Grained Thematic Discrepancy in U.S.
News Headlines [63.52264764099532]
We use a large dataset of 1.8 million news headlines from major U.S. media outlets spanning from 2014 to 2022.
We quantify the fine-grained thematic discrepancy related to four prominent topics - domestic politics, economic issues, social issues, and foreign affairs.
Our findings indicate that on domestic politics and social issues, the discrepancy can be attributed to a certain degree of media bias.
arXiv Detail & Related papers (2023-03-28T03:31:37Z) - Automated multilingual detection of Pro-Kremlin propaganda in newspapers
and Telegram posts [5.886782001771578]
The full-scale conflict between the Russian Federation and Ukraine generated an unprecedented amount of news articles and social media data.
This study analyses how the media affected and mirrored public opinion during the first month of the war using news articles and Telegram news channels in Ukrainian, Russian, Romanian and English.
We propose and compare two methods of multilingual automated pro-Kremlin propaganda identification, based on Transformers and linguistic features.
arXiv Detail & Related papers (2023-01-25T14:25:37Z) - "A Special Operation": A Quantitative Approach to Dissecting and
Comparing Different Media Ecosystems' Coverage of the Russo-Ukrainian War [5.567674129101803]
The coverage of the Russian invasion of Ukraine has varied widely between Western, Russian, and Chinese media ecosystems.
We find that while the Western press outlets have focused on the military and humanitarian aspects of the war, Russian media have focused on the purported justifications for the "special military operation"
We measure the degree to which Russian media has influenced Chinese coverage across Chinese outlets' news articles, Weibo accounts, and Twitter accounts.
arXiv Detail & Related papers (2022-10-06T16:04:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.