Fugu-MT 論文翻訳(概要): Semantic Reconstruction of Adversarial Plagiarism: A Context-Aware Framework for Detecting and Restoring "Tortured Phrases" in Scientific Literature

論文の概要: Semantic Reconstruction of Adversarial Plagiarism: A Context-Aware Framework for Detecting and Restoring "Tortured Phrases" in Scientific Literature

arxiv url: http://arxiv.org/abs/2512.10435v1
Date: Thu, 11 Dec 2025 08:53:25 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-12 16:15:42.290895
Title: Semantic Reconstruction of Adversarial Plagiarism: A Context-Aware Framework for Detecting and Restoring "Tortured Phrases" in Scientific Literature
Title（参考訳）: 逆境プラギア主義のセマンティック再構築 : 科学的文献における「学習句」の検出と復元のための文脈認識フレームワーク
Authors: Agniva Maiti, Prajwal Panth, Suresh Chandra Satapathy,
Abstract要約: 逆行性プラギアリズム(SRAP)のセマンティック再構築を提案する。 SRAPは、これらの異常を検出するだけでなく、元の用語を数学的に復元するために設計されたフレームワークである。我々は,(1)トークンレベルの擬似パープレキシティを用いたドメイン固有マスキング言語モデル(SciBERT)による統計的異常検出,(2)密度ベクトル検索(FAISS)と文レベルのアライメント(SBERT)を用いたソースベースセマンティック再構築という2段階アーキテクチャを用いている。
参考スコア（独自算出の注目度）: 4.905540561146363
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The integrity and reliability of scientific literature is facing a serious threat by adversarial text generation techniques, specifically from the use of automated paraphrasing tools to mask plagiarism. These tools generate "tortured phrases", statistically improbable synonyms (e.g. "counterfeit consciousness" for "artificial intelligence"), that preserve the local grammar while obscuring the original source. Most existing detection methods depend heavily on static blocklists or general-domain language models, which suffer from high false-negative rates for novel obfuscations and cannot determine the source of the plagiarized content. In this paper, we propose Semantic Reconstruction of Adversarial Plagiarism (SRAP), a framework designed not only to detect these anomalies but to mathematically recover the original terminology. We use a two-stage architecture: (1) statistical anomaly detection with a domain-specific masked language model (SciBERT) using token-level pseudo-perplexity, and (2) source-based semantic reconstruction using dense vector retrieval (FAISS) and sentence-level alignment (SBERT). Experiments on a parallel corpus of adversarial scientific text show that while zero-shot baselines fail completely (0.00 percent restoration accuracy), our retrieval-augmented approach achieves 23.67 percent restoration accuracy, significantly outperforming baseline methods. We also show that static decision boundaries are necessary for robust detection in jargon-heavy scientific text, since dynamic thresholding fails under high variance. SRAP enables forensic analysis by linking obfuscated expressions back to their most probable source documents.
Abstract（参考訳）: 科学文献の完全性と信頼性は、敵対的なテキスト生成技術によって深刻な脅威に直面している。これらのツールは「ねじれたフレーズ」や統計的に不可能な同義語(例えば「芸術的知性」の「偽造意識」)を生成し、原典を隠しながら現地の文法を保存する。既存の検出手法の多くは静的ブロックリストや一般ドメイン言語モデルに大きく依存しており、これは新しい難読化の偽陰性率が高く、盗作コンテンツのソースを決定できない。本稿では,これらの異常を検知するだけでなく,元の用語を数学的に復元するフレームワークであるSRAP(Semantic Reconstruction of Adversarial Plagiarism)を提案する。本研究では,(1)トークンレベルの擬似パープレキシティを用いたドメイン固有マスキング言語モデル(SciBERT)による統計的異常検出,(2)密度ベクトル検索(FAISS)と文レベルのアライメント(SBERT)を用いたソースベースセマンティック再構築を行う。逆方向の学術テキストの並列コーパス実験では、ゼロショットベースラインは完全に失敗する(0.00パーセントの復元精度)が、我々の検索強化アプローチは23.67パーセントの復元精度を達成し、ベースライン法を著しく上回っている。また, 動的しきい値設定が高分散条件下で失敗するため, ジャーゴン重科学的テキストのロバスト検出には静的決定境界が必要であることを示す。 SRAPは、難解な表現を最も可能性の高いソースドキュメントにリンクすることで、法医学的な解析を可能にする。

論文の概要: Semantic Reconstruction of Adversarial Plagiarism: A Context-Aware Framework for Detecting and Restoring "Tortured Phrases" in Scientific Literature

関連論文リスト