Fugu-MT 論文翻訳(概要): Large Language Models Are Effective Code Watermarkers

論文の概要: Large Language Models Are Effective Code Watermarkers

arxiv url: http://arxiv.org/abs/2510.11251v1
Date: Mon, 13 Oct 2025 10:40:24 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:30.323652
Title: Large Language Models Are Effective Code Watermarkers
Title（参考訳）: 大規模な言語モデルは効果的なコード透かしである
Authors: Rui Xu, Jiawei Chen, Zhaoxia Yin, Cong Kong, Xinpeng Zhang,
Abstract要約: ウォーターマーキングは、ソース属性に対する有望なソリューションとして登場した。 CodeMark-LLMはそのセマンティクスや可読性を損なうことなく、ソースコードに透かしを埋め込む。
参考スコア（独自算出の注目度）: 23.085224961348015
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The widespread use of large language models (LLMs) and open-source code has raised ethical and security concerns regarding the distribution and attribution of source code, including unauthorized redistribution, license violations, and misuse of code for malicious purposes. Watermarking has emerged as a promising solution for source attribution, but existing techniques rely heavily on hand-crafted transformation rules, abstract syntax tree (AST) manipulation, or task-specific training, limiting their scalability and generality across languages. Moreover, their robustness against attacks remains limited. To address these limitations, we propose CodeMark-LLM, an LLM-driven watermarking framework that embeds watermark into source code without compromising its semantics or readability. CodeMark-LLM consists of two core components: (i) Semantically Consistent Embedding module that applies functionality-preserving transformations to encode watermark bits, and (ii) Differential Comparison Extraction module that identifies the applied transformations by comparing the original and watermarked code. Leveraging the cross-lingual generalization ability of LLM, CodeMark-LLM avoids language-specific engineering and training pipelines. Extensive experiments across diverse programming languages and attack scenarios demonstrate its robustness, effectiveness, and scalability.
Abstract（参考訳）: 大規模言語モデル(LLM)とオープンソースコードの普及は、不正な再配布、ライセンス違反、悪意ある目的のためのコードの誤使用など、ソースコードの配布と帰属に関する倫理的およびセキュリティ上の懸念を提起している。ウォーターマーキングは、ソース属性のための有望なソリューションとして登場したが、既存のテクニックは手作りの変換ルール、抽象構文木(AST)操作、タスク固有のトレーニングに大きく依存しており、言語間のスケーラビリティと汎用性を制限する。また、攻撃に対する頑丈さは依然として限られている。これらの制限に対処するために,ソースコードに透かしを埋め込むLLM駆動の透かしフレームワークであるCodeMark-LLMを提案する。 CodeMark-LLMは2つのコアコンポーネントから構成される。一透かしビットを符号化する機能保存変換を施した意味的に一貫性のある埋め込みモジュール (ii) 元のコードと透かしコードを比較して適用された変換を識別する差分比較抽出モジュール。 LLMの言語間一般化機能を活用して、CodeMark-LLMは言語固有のエンジニアリングとトレーニングパイプラインを避ける。多様なプログラミング言語やアタックシナリオにわたる大規模な実験は、その堅牢性、有効性、スケーラビリティを示している。

論文の概要: Large Language Models Are Effective Code Watermarkers

関連論文リスト