Fugu-MT 論文翻訳(概要): Unsupervised Style Representation Learning for AI-Text Detection via Paraphrase Inversion

論文の概要: Unsupervised Style Representation Learning for AI-Text Detection via Paraphrase Inversion

arxiv url: http://arxiv.org/abs/2606.10099v1
Date: Mon, 08 Jun 2026 19:28:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-10 15:40:58.155894
Title: Unsupervised Style Representation Learning for AI-Text Detection via Paraphrase Inversion
Title（参考訳）: パラフレーズインバージョンを用いたAIテキスト検出のための教師なしスタイル表現学習
Authors: Rafael Rivera Soto, Barry Chen, Nicholas Andrews,
Abstract要約: 我々は、その機械生成パラフレーズから人間によるテキストを再構築するスタイルエンコーダを訓練する。学習した表現を2つの検出戦略により評価する: 数発の検知器と0発のDeepSVDDベースの検出器である。
参考スコア（独自算出の注目度）: 5.789169343514737
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid development of large language models (LLMs) has raised concerns about misuse such as plagiarism, misinformation, and automated influence operations, motivating the need for robust detectors. Recent work has shown that neural representations of writing style are effective for detection and, crucially, robust to adversarial attacks that defeat most existing detectors. However, current style-based detectors rely on authorship labels for training, and are limited to few-shot inference for detection, requiring in-distribution samples that may not always be available. We learn discriminative style features without authorship labels by training a style encoder to reconstruct human-authored text from its machine-generated paraphrase; freezing a semantic encoder during training biases the style encoder to capture only the non-semantic features needed for reconstruction. We evaluate the learned representations via two detection strategies: a few-shot detector and a zero-shot DeepSVDD-based detector. Across benchmarks, our method matches or outperforms all baselines in the few-shot setting and, in the zero-shot regime, is competitive with fully supervised classifiers on in-distribution test data while generalizing better to unseen LLMs. Beyond detection, the learned representations generalize to unseen tasks, achieving competitive performance on authorship verification and fine-grained style discrimination despite never being trained on either objective.
Abstract（参考訳）: 大規模言語モデル(LLM)の急速な発展は、盗作、誤情報、自動影響操作などの誤用を懸念し、堅牢な検出器の必要性を動機付けている。近年の研究では、筆記スタイルの神経表現が検出に有効であること、そして重要なことに、既存のほとんどの検出器を倒す敵の攻撃に対して堅牢であることが示されている。しかし、現在のスタイルベースの検出器は、トレーニングには著者ラベルに依存しており、検出には数発の推論に限られており、常に利用できるとは限らない分布内サンプルを必要とする。著者ラベルを使わずに識別可能なスタイルの特徴を学習し、その機械学習パラフレーズから人文を再構成するスタイルエンコーダを訓練し、トレーニング中に意味的エンコーダを凍結することで、復元に必要な非意味的特徴のみをキャプチャするスタイルエンコーダをバイアスする。学習した表現を2つの検出戦略により評価する: 数発の検知器と0発のDeepSVDDベースの検出器である。ベンチマーク全体において,本手法は数ショット設定で全てのベースラインに一致または性能を向上し,ゼロショット方式では,分布内テストデータ上で完全に教師付き分類器と競合する一方で,未知のLCMよりも優れた一般化を行う。学習された表現は、検出以外にも、未確認のタスクに一般化し、著者の検証と細かなスタイルの識別において、どちらの目的にも訓練されないにもかかわらず、競争的なパフォーマンスを達成する。

論文の概要: Unsupervised Style Representation Learning for AI-Text Detection via Paraphrase Inversion

関連論文リスト