Fugu-MT 論文翻訳(概要): From Injection to Defense: Constructing Edit-Based Fingerprints for Large Language Models

論文の概要: From Injection to Defense: Constructing Edit-Based Fingerprints for Large Language Models

arxiv url: http://arxiv.org/abs/2509.03122v2
Date: Wed, 08 Oct 2025 16:23:32 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-09 14:21:18.116416
Title: From Injection to Defense: Constructing Edit-Based Fingerprints for Large Language Models
Title（参考訳）: インジェクションからディフェンスへ:大規模言語モデルのための編集ベースのフィンガープリントを構築する
Authors: Yue Li, Xin Yi, Dongsheng Shi, Yongyi Cui, Gerard de Melo, Linlin Wang,
Abstract要約: 本稿では,ルールベースの多言語自然言語指紋(MNLF)を組み込んだ知識編集フレームワークRFEditを提案する。 RFEditはFingerprint Subspace-aware Fine-Tuning (FSFT)によって保護されている。
参考スコア（独自算出の注目度）: 28.393476667026523
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fingerprinting is critical for maintaining traceability and protecting the intellectual property (IP) of developers, as LLMs deployed in web applications are susceptible to unauthorized redistribution and misuse via fine-tuning or black-box deployment. However, current backdoor-based fingerprinting methods face a fundamental trade-off: fingerprints embedded as garbled text are easily detected and filtered, whereas those crafted as coherent natural language are prone to being triggered unintentionally. To overcome these limitations, we propose RFEdit, a knowledge-editing framework that embeds a rule-based multilingual natural language fingerprint (MNLF) by modifying a sparse subset of model weights. This approach enables efficient and robust fingerprint injection with minimal impact on unrelated knowledge in LLMs. Our RFEdit framework is further safeguarded by Fingerprint Subspace-aware Fine-Tuning (FSFT), which mitigates fingerprint degradation during legitimate fine-tuning by restricting parameter updates to the fingerprint subspace. This approach preserves fingerprint integrity while enhancing downstream task performance of LLMs. These advances establish a comprehensive pipeline from fingerprint injection to defense, achieving high detection effectiveness, robustness against adversarial manipulations, harmlessness to model utility, and persistence under fine-tuning. Extensive experiments demonstrate that RFEdit maintains robustness under quantization and pruning. Additionally, fingerprint effectiveness is generally improved by more than 10\% when combined with FSFT for math and alpaca downstream tasks.
Abstract（参考訳）: フィンガープリンティングは、トレーサビリティの維持と開発者の知的財産権(IP)の保護に重要である。しかし、現在のバックドアベースの指紋認証法は基本的なトレードオフに直面しており、ガーブラートテキストとして埋め込まれた指紋は容易に検出・フィルタリングされ、コヒーレントな自然言語として作られた指紋は意図せずに引き起こされる傾向にある。これらの制限を克服するために,ルールベースの多言語自然言語指紋(MNLF)を組み込んだ知識編集フレームワークRFEditを提案する。このアプローチにより、LLMにおける無関係な知識に最小限の影響を伴って、効率的で堅牢な指紋注入が可能となる。 RFEditフレームワークはFingerprint Subspace-aware Fine-Tuning (FSFT)によってさらに保護されている。このアプローチは,LLMの下流タスク性能を向上しつつ,指紋の整合性を維持する。これらの進歩は、指紋注入から防御への包括的パイプラインを確立し、高い検出効率、敵の操作に対する堅牢性、実用性に対する無害性、微調整下での持続性を実現している。大規模実験により、RFEditは量子化とプルーニングの下で堅牢性を維持することが示された。さらに、数学とアルパカ下流のタスクでFSFTと組み合わせた場合、指紋の有効性は一般的に10\%以上改善される。

論文の概要: From Injection to Defense: Constructing Edit-Based Fingerprints for Large Language Models

関連論文リスト