Fugu-MT 論文翻訳(概要): Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models

論文の概要: Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models

arxiv url: http://arxiv.org/abs/2603.20957v3
Date: Sat, 28 Mar 2026 19:27:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 13:48:18.769627
Title: Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models
Title（参考訳）: アライメント・ワック・ア・モール : ファインタニングは大規模言語モデルにおける著作権書のバーバティムリコールを活性化する
Authors: Xinyue Liu, Niloofar Mireshghallah, Jane C. Ginsburg, Tuhin Chakrabarty,
Abstract要約: プロットサマリーを全文に拡張するための微調整トレーニングモデルは、保持されている著作権付き書籍の85%を再生する。村上春樹の小説の特色は、30点以上の無関係作家の著作を口頭で思い出させるものである。我々の発見は、モデルウェイトが著作権作品のコピーを保存し、個々の著者の作品を微調整した後に現れるセキュリティの失敗が、最近の公正使用判決の重要な前提を損なうという、説得力のある証拠を提供する。
参考スコア（独自算出の注目度）: 15.308143290363246
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Frontier LLM companies have repeatedly assured courts and regulators that their models do not store copies of training data. They further rely on safety alignment strategies via RLHF, system prompts, and output filters to block verbatim regurgitation of copyrighted works, and have cited the efficacy of these measures in their legal defenses against copyright infringement claims. We show that finetuning bypasses these protections: by training models to expand plot summaries into full text, a task naturally suited for commercial writing assistants, we cause GPT-4o, Gemini-2.5-Pro, and DeepSeek-V3.1 to reproduce up to 85-90% of held-out copyrighted books, with single verbatim spans exceeding 460 words, using only semantic descriptions as prompts and no actual book text. This extraction generalizes across authors: finetuning exclusively on Haruki Murakami's novels unlocks verbatim recall of copyrighted books from over 30 unrelated authors. The effect is not specific to any training author or corpus: random author pairs and public-domain finetuning data produce comparable extraction, while finetuning on synthetic text yields near-zero extraction, indicating that finetuning on individual authors' works reactivates latent memorization from pretraining. Three models from different providers memorize the same books in the same regions ($r \ge 0.90$), pointing to an industry-wide vulnerability. Our findings offer compelling evidence that model weights store copies of copyrighted works and that the security failures that manifest after finetuning on individual authors' works undermine a key premise of recent fair use rulings, where courts have conditioned favorable outcomes on the adequacy of measures preventing reproduction of protected expression.
Abstract（参考訳）: 最前線のLLM企業は、彼らのモデルはトレーニングデータのコピーを保存していないことを裁判所や規制当局に繰り返し保証してきた。彼らはさらに、RLHF、システムプロンプト、出力フィルタによる著作権作品の言語的復活を阻止する安全アライメント戦略に依存しており、著作権侵害の主張に対する法的防御においてこれらの措置の有効性を引用している。 GPT-4o、Gemini-2.5-Pro、DeepSeek-V3.1は、プロンプトとしての意味記述と実際の本文を使用せずに、最大85～90%の著作権本を再生する。この抜粋は、村上春樹の小説のみに特化して、30以上の無関係作家の著作を口頭で思い出させる。ランダムな著者ペアとパブリックドメインの微調整データは同等の抽出を生成する一方、合成テキストの微調整はほぼゼロに近い抽出をもたらす。異なるプロバイダの3つのモデルは、同じリージョン(r \ge 0.90$)で同じ本を記憶しており、業界全体の脆弱性を示している。本研究は, モデル重みが著作権作品の複製を保存していることを示すとともに, 個々の著作者の著作物を微調整した後のセキュリティ上の欠陥が, 保護された表現の再生を阻止する措置の適当性について, 裁判所が有利な結果を規定した最近の公正使用判決の重要な前提を損なうことを示唆している。

論文の概要: Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models

関連論文リスト