Fugu-MT 論文翻訳(概要): ReTokSync: Self-Synchronizing Tokenization Disambiguation for Generative Linguistic Steganography

論文の概要: ReTokSync: Self-Synchronizing Tokenization Disambiguation for Generative Linguistic Steganography

arxiv url: http://arxiv.org/abs/2604.25486v1
Date: Tue, 28 Apr 2026 10:42:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-29 16:49:17.823639
Title: ReTokSync: Self-Synchronizing Tokenization Disambiguation for Generative Linguistic Steganography
Title（参考訳）: ReTokSync:ジェネレーティブ言語ステレオグラフィーのための自己同期型トークン化曖昧化
Authors: Yaofei Wang, Rui Wang, Weilong Pang, JiaLiang Han, Yuan Qi, Donghui Hu, Kejiang Chen,
Abstract要約: 生成言語ステガノグラフィーは、秘密メッセージを自然言語生成プロセスに埋め込むことで、秘密のコミュニケーションを可能にする。同じ表面テキストは受信側で異なるトークンシーケンスに再トークン化され、共有復号状態が破られる。生成中のレシーバビューのトークン化を監視し,あいまいさの発生時にのみ修正リセットをトリガーするフレームワークであるReTokSyncを提案する。
参考スコア（独自算出の注目度）: 28.687186883782033
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative linguistic steganography (GLS) enables covert communication by embedding secret messages into the natural language generation process. In practical deployment, however, GLS is vulnerable to tokenization ambiguity: the same surface text may be re-tokenized into a different token sequence at the receiver, breaking the shared decoding state between the communicating parties so that a single local mismatch can propagate into complete extraction failure. Existing solutions either remove ambiguous tokens -- distorting the generation distribution and compromising security -- or preserve the distribution at the cost of substantially reduced embedding capacity or prohibitive runtime overhead. To address this issue, we propose ReTokSync (Re-Tokenization Synchronization), a self-synchronizing disambiguation framework that monitors the receiver-view tokenization during generation and triggers a corrective reset only when ambiguity actually occurs. By confining the effect of tokenization ambiguity to sparse residual bit errors rather than global desynchronization, ReTokSync leaves ambiguity-free positions entirely untouched and remains compatible with the underlying steganographic algorithm. Experiments on both English and Chinese settings show that ReTokSync stays closest to the steganographic baseline in distributional security (zero KL divergence), text quality, embedding capacity, and runtime, while achieving extraction accuracy above 99.7\%. Building on this property, we further develop a two-channel covert communication mechanism in which ReTokSync serves as the primary channel and a reliable auxiliary channel corrects the remaining errors, achieving 100\% end-to-end recovery across all evaluated configurations.
Abstract（参考訳）: ジェネレーティブ言語ステガノグラフィ(GLS)は、秘密メッセージを自然言語生成プロセスに埋め込むことで、秘密のコミュニケーションを可能にする。しかし、現実的な展開では、GLSはトークン化の曖昧さに対して脆弱であり、同じ表面テキストを受信側で異なるトークンシーケンスに再トークン化し、通信側間で共有された復号状態を破り、単一のローカルミスマッチが完全な抽出失敗へと伝播する。既存のソリューションでは、あいまいなトークンを削除 -- 生成の分散を歪め、セキュリティを損なう -- するか、組み込み能力を大幅に削減したり、ランタイムのオーバーヘッドを禁止したりするコストで分散を保存するかのいずれかです。この問題に対処するために、生成中のレシーバビューのトークン化を監視し、あいまいさが実際に発生する場合にのみ修正リセットをトリガーする自己同期型曖昧化フレームワークであるReTokSync(Re-Tokenization Synchronization)を提案する。トークン化アンビグニティがグローバルデシンクロナイゼーションよりも残差ビットエラーを分離する効果を補うことで、ReTokSyncはアンビグニティフリーな位置を完全に変更せず、基礎となるステガノグラフィーアルゴリズムと互換性を保っている。英語と中国語の両方での実験では、ReTokSyncは分散セキュリティ(ゼロKL分散)、テキスト品質、埋め込み能力、実行時のステガノグラフィーベースラインに近づき、99.7\%以上の抽出精度を実現している。この特性に基づいて、ReTokSyncがプライマリチャネルとして機能し、信頼性の高い補助チャネルが残りのエラーを補正し、評価されたすべての構成に対して100 %のエンド・ツー・エンド・リカバリを実現する2チャンネルの秘密通信機構を開発する。

論文の概要: ReTokSync: Self-Synchronizing Tokenization Disambiguation for Generative Linguistic Steganography

関連論文リスト