Fugu-MT 論文翻訳(概要): Rewrite the News: Tracing Editorial Reuse Across News Agencies

論文の概要: Rewrite the News: Tracing Editorial Reuse Across News Agencies

arxiv url: http://arxiv.org/abs/2603.29937v1
Date: Tue, 31 Mar 2026 16:10:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-01 15:25:03.841741
Title: Rewrite the News: Tracing Editorial Reuse Across News Agencies
Title（参考訳）: ニュースの書き直し:ニュース機関全体にわたる編集者の再利用の追跡
Authors: Soveatin Kuntur, Nina Smirnova, Anna Wroblewska, Philipp Mayr, Sebastijan Razboršek Maček,
Abstract要約: 完全翻訳を必要とせずに文レベルの言語間再利用を検出する弱教師付き手法を提案する。この研究は、スロベニア通信社(Slovenian Press Agency)の英語記事と、15の外国機関からの報告を比較した。再利用されたコンテンツは英語記事の中端に現れる傾向があり、リードはしばしばオリジナルである。
参考スコア（独自算出の注目度）: 2.108916445920616
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper investigates sentence-level text reuse in multilingual journalism, analyzing where reused content occurs within articles. We present a weakly supervised method for detecting sentence-level cross-lingual reuse without requiring full translations, designed to support automated pre-selection to reduce information overload for journalists (Holyst et al., 2024). The study compares English-language articles from the Slovenian Press Agency (STA) with reports from 15 foreign agencies (FA) in seven languages, using publication timestamps to retain the earliest likely foreign source for each reused sentence. We analyze 1,037 STA and 237,551 FA articles from two time windows (October 7-November 2, 2023; February 1-28, 2025) and identify 1,087 aligned sentence pairs after filtering to the earliest sources. Reuse occurs in 52% of STA articles and 1.6% of FA articles and is predominantly non-literal, involving paraphrase and compositional reuse from multiple sources. Reused content tends to appear in the middle and end of English articles, while leads are more often original, indicating that simple lexical matching overlooks substantial editorial reuse. Compared with prior work focused on monolingual overlap, we (i) detect reuse across languages without requiring full translation, (ii) use publication timing to identify likely sources, and (iii) analyze where reused material is situated within articles. Dataset and code: https://github.com/kunturs/lrec2026-rewrite-news.
Abstract（参考訳）: 本稿では、多言語ジャーナリズムにおける文レベルのテキスト再利用について検討し、再利用されたコンテンツが記事内でどこで発生するかを分析する。本稿では,ジャーナリストの情報過負荷を軽減するための自動事前選択を支援するために,全文翻訳を必要とせずに文レベルの言語間再利用を検出する弱教師付き手法を提案する(Holyst et al , 2024)。この研究は、スロベニア報道庁(STA)の英語記事と、7つの言語における15の外国機関(FA)からの報告を比較した。 2つの時間窓(2023年10月7日～11月2日～28日～2025年2月1日)から1,037 STAと237,551個のFA項目を分析し,最初期の情報源にフィルタリングした後,1,087個の文対を同定した。再利用はSTA記事の52%、FA記事の1.6%で行われ、主にノンリテラルであり、パラフレーズや複数のソースからの合成再利用を含んでいる。再使用されたコンテンツは、英語記事の中と終わりに現れる傾向があり、リードは、しばしばオリジナルであり、単純な語彙マッチングが実質的な編集の再利用を見落としていることを示している。モノリンガルの重複に着目した先行研究と比較して、私たちは (i)全翻訳を必要とせずに言語間の再利用を検出する。二出版の時期を利用して、見込みのある資料を特定し、三再利用物が物品の中にどこにあるかを分析すること。データセットとコード:https://github.com/kunturs/lrec2026-rewrite-news

論文の概要: Rewrite the News: Tracing Editorial Reuse Across News Agencies

関連論文リスト