Fugu-MT 論文翻訳(概要): A Character-based Diffusion Embedding Algorithm for Enhancing the Generation Quality of Generative Linguistic Steganographic Texts

論文の概要: A Character-based Diffusion Embedding Algorithm for Enhancing the Generation Quality of Generative Linguistic Steganographic Texts

arxiv url: http://arxiv.org/abs/2505.00977v2
Date: Wed, 07 May 2025 17:00:28 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-08 12:54:13.624864
Title: A Character-based Diffusion Embedding Algorithm for Enhancing the Generation Quality of Generative Linguistic Steganographic Texts
Title（参考訳）: 代用言語ステレオテキストの生成品質向上のための文字ベース拡散埋め込みアルゴリズム
Authors: Yingquan Chen, Qianmu Li, Xiaocong Wu, Huifeng Li, Qing Chang,
Abstract要約: 本稿では,文字ベース拡散埋め込みアルゴリズム(CDEA)を提案する。キャラクタレベルでの一般統計特性に基づいて,高確率候補単語の選択頻度を向上させる。候補プールにおける低確率候補単語の選択頻度を低減する。実験の結果,CDEAとXLNetの組み合わせは,生成されたステガノグラフィテキストの品質を著しく向上させることがわかった。
参考スコア（独自算出の注目度）: 6.571673823245785
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating high-quality steganographic text is a fundamental challenge in the field of generative linguistic steganography. This challenge arises primarily from two aspects: firstly, the capabilities of existing models in text generation are limited; secondly, embedding algorithms fail to effectively mitigate the negative impacts of sensitive information's properties, such as semantic content or randomness. Specifically, to ensure that the recipient can accurately extract hidden information, embedding algorithms often have to consider selecting candidate words with relatively low probabilities. This phenomenon leads to a decrease in the number of high-probability candidate words and an increase in low-probability candidate words, thereby compromising the semantic coherence and logical fluency of the steganographic text and diminishing the overall quality of the generated steganographic material. To address this issue, this paper proposes a novel embedding algorithm, character-based diffusion embedding algorithm (CDEA). Unlike existing embedding algorithms that strive to eliminate the impact of sensitive information's properties on the generation process, CDEA leverages sensitive information's properties. It enhances the selection frequency of high-probability candidate words in the candidate pool based on general statistical properties at the character level and grouping methods based on power-law distributions, while reducing the selection frequency of low-probability candidate words in the candidate pool. Furthermore, to ensure the effective transformation of sensitive information in long sequences, we also introduce the XLNet model. Experimental results demonstrate that the combination of CDEA and XLNet significantly improves the quality of generated steganographic text, particularly in terms of perceptual-imperceptibility.
Abstract（参考訳）: 高品質なステガノグラフィーテキストの生成は、生成言語ステガノグラフィーの分野における根本的な課題である。第一に、テキスト生成における既存のモデルの能力は限られており、第二に、埋め込みアルゴリズムは、セマンティックコンテンツやランダムネスのような機密情報の特性の負の影響を効果的に軽減できない。具体的には、受信者が隠された情報を正確に抽出できるように、埋め込みアルゴリズムは比較的低い確率で候補単語を選択することを検討する必要がある。この現象は, 高確率候補語数の減少と低確率候補語の増加を招き, ステガノグラフィテキストのセマンティックコヒーレンスと論理流布を妥協し, 生成されたステガノグラフィ素材の全体的な品質を低下させる。そこで本研究では,文字ベース拡散埋め込みアルゴリズム(CDEA)を提案する。機密情報のプロパティが生成プロセスに与える影響を排除しようとする既存の埋め込みアルゴリズムとは異なり、CDEAは機密情報のプロパティを活用する。キャラクタレベルでの一般統計特性とパワーロー分布に基づくグループ化手法に基づいて、候補プールにおける高確率候補単語の選択頻度を向上するとともに、候補プールにおける低確率候補単語の選択頻度を低減させる。さらに、長いシーケンスにおけるセンシティブな情報の効果的な変換を保証するため、XLNetモデルも導入する。実験の結果,CDEAとXLNetの組み合わせは,特に知覚的不知覚の観点から,生成したステガノグラフィテキストの品質を著しく向上させることがわかった。

関連論文リスト

A high-capacity linguistic steganography based on entropy-driven rank-token mapping [81.29800498695899]
言語ステガノグラフィーは、秘密のメッセージを無害なテキストに埋め込むことによって、秘密のコミュニケーションを可能にする。従来の修正ベースの手法は検出可能な異常を導入し、検索ベースの戦略は埋め込み能力の低下に悩まされている。本稿では、ランクベース適応符号化と文脈認識の圧縮を正規化エントロピーと統合したRTMStegaというエントロピー駆動のフレームワークを提案する。
論文参考訳（メタデータ） (2025-10-27T06:02:47Z)
Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection [71.59834293521074]
我々は,人間によるテキストと機械によるテキストを区別する枠組みを開発した。提案手法は,DeepFakeデータセット上で98.3%のAUROCとAUPRを8.9%のFPR95で達成する。コード、事前トレーニングされたウェイト、デモがリリースされる。
論文参考訳（メタデータ） (2025-10-07T08:14:45Z)
DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation [2.4555276449137042]
トークン確率分布の数学的解析を利用する新しい復号法であるDiffSamplingを提案する。 4つの異なるテキスト生成タスクを含む実験により、我々のアプローチは、少なくとも既存の方法と同等に、一貫して機能することを示した。
論文参考訳（メタデータ） (2025-02-19T19:00:02Z)
TextSleuth: Towards Explainable Tampered Text Detection [49.88698441048043]
本稿では,大規模なマルチモーダルモデルを用いて,自然言語による改ざんテキスト検出の基礎を説明する。このタスクのデータギャップを埋めるため,大規模な包括的データセットETTDを提案する。 GPT4oで高品質な異常記述を生成するために、共同クエリが導入された。低品質なアノテーションを自動的にフィルタリングするために、GPT4oに改ざんされたテキストを認識するよう促すことも提案する。
論文参考訳（メタデータ） (2024-12-19T13:10:03Z)
Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident Narratives [0.0]
我々はボストン警察署が記録した警察と公共の相互作用に関する公開テキストの物語を分析した。命令調律大言語モデル(IT-LLM)の定性的符号化と人間のコーダとの比較を行った。
論文参考訳（メタデータ） (2024-12-16T15:27:37Z)
ADLM -- stega: A Universal Adaptive Token Selection Algorithm for Improving Steganographic Text Quality via Information Entropy [1.413488665073795]
ステガノグラフィーシステムは、機密情報を公共キャリアに埋め込むことで、情報セキュリティを高める。既存の生成テキストステガノグラフィー手法は、候補単語プールの長テール分布を扱う際の課題に直面している。本稿では,情報エントロピー制約に基づくステガノグラフテキスト生成の品質制御理論を提案する。
論文参考訳（メタデータ） (2024-10-28T08:25:31Z)
Provably Secure Disambiguating Neural Linguistic Steganography [66.30965740387047]
サブワードに基づく言語モデルを使用する際に生じるセグメンテーションの曖昧さ問題は、時にはデコード障害を引き起こす。そこで我々はSyncPoolという,セグメンテーションのあいまいさ問題に効果的に対処する,セキュアな曖昧さ回避手法を提案する。 SyncPoolは、候補プールのサイズやトークンの分布を変えないため、確実に安全な言語ステガノグラフィー手法に適用できる。
論文参考訳（メタデータ） (2024-03-26T09:25:57Z)
ToBlend: Token-Level Blending With an Ensemble of LLMs to Attack AI-Generated Text Detection [6.27025292177391]
ToBlendはトークンレベルのアンサンブルテキスト生成手法であり、現在のAIコンテンツ検出アプローチの堅牢性に挑戦する。 ToBlendは、主要なAIコンテンツ検出手法の性能を著しく低下させる。
論文参考訳（メタデータ） (2024-02-17T02:25:57Z)
Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy [52.765898203824975]
本研究では,条件付きテキスト生成と入力コンテキストの特性を考慮した意味認識型透かしアルゴリズムを提案する。実験結果から,提案手法は様々なテキスト生成モデルに対して大幅な改善をもたらすことが示された。
論文参考訳（メタデータ） (2023-07-25T20:24:22Z)
An Invariant Learning Characterization of Controlled Text Generation [25.033675230270212]
制御生成(英語: Controlled generation)とは、興味のある文体や意味的な属性を含むテキストを作成する問題である。ユーザプロンプトに応答するテキストの分布が、予測器がトレーニングした分布と異なる場合、制御された生成の性能は低下する可能性があることを示す。
論文参考訳（メタデータ） (2023-05-31T21:35:08Z)
CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Data Limitation With Contrastive Learning [14.637303913878435]
低リソースシナリオ下でMGTを検出するために,コヒーレンスに基づくコントラスト学習モデルCoCoを提案する。言語的特徴を活用するために,グラフ形式でコヒーレンス情報をテキスト表現にエンコードする。 2つの公開データセットと2つの自己構築データセットの実験結果は、我々のアプローチが最先端の手法を大幅に上回っていることを証明している。
論文参考訳（メタデータ） (2022-12-20T15:26:19Z)
An Analysis of the Effects of Decoding Algorithms on Fairness in Open-Ended Language Generation [77.44921096644698]
本稿では,復号化アルゴリズムがLMフェアネスに与える影響を体系的に分析する。公平さ、多様性、品質のトレードオフを分析します。
論文参考訳（メタデータ） (2022-10-07T21:33:34Z)
On the probability-quality paradox in language generation [76.69397802617064]
我々は情報理論レンズを用いて言語生成を分析する。人間の言語は自然文字列上の分布のエントロピーに近い量の情報を含むべきであると仮定する。
論文参考訳（メタデータ） (2022-03-31T17:43:53Z)
Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information [55.75018546938499]
本稿では,話者埋め込み認識型ニューラルダイアリゼーション(SEND)手法を提案する。本手法は,ターゲット話者の音声活動検出よりも低いダイアリゼーション誤差率を実現する。
論文参考訳（メタデータ） (2021-11-28T12:51:04Z)
Be More with Less: Hypergraph Attention Networks for Inductive Text Classification [56.98218530073927]
グラフニューラルネットワーク(GNN)は、研究コミュニティで注目され、この標準タスクで有望な結果を実証している。成功にもかかわらず、それらのパフォーマンスは、単語間の高次相互作用をキャプチャできないため、実際は大部分が危険に晒される可能性がある。本稿では,テキスト表現学習において,少ない計算量でより表現力の高いハイパーグラフアテンションネットワーク(HyperGAT)を提案する。
論文参考訳（メタデータ） (2020-11-01T00:21:59Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。