Fugu-MT 論文翻訳(概要): Algorithmic Fragility and Persona Bias in LLM-Generated Autistic Communication

論文の概要: Algorithmic Fragility and Persona Bias in LLM-Generated Autistic Communication

arxiv url: http://arxiv.org/abs/2605.26397v2
Date: Mon, 01 Jun 2026 17:07:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 18:24:16.519483
Title: Algorithmic Fragility and Persona Bias in LLM-Generated Autistic Communication
Title（参考訳）: LLMによる自閉症コミュニケーションにおけるアルゴリズム的脆弱性とペルソナバイアス
Authors: Naba Rizvi, Mohammed Rizvi, Harper Strickland, Saleha Ahmedi, Nedjma Ousidhoum,
Abstract要約: 安全アライメントは明らかに有害な出力を減少させるが、衛生的で神経ノルミティブなコミュニケーションの表現を不注意に符号化する。本研究では、この符号化を二重対人書き直しパラダイムを用いて検討し、10大言語モデルに対して、自閉症または神経型ペルソナから自然に発生する自閉症の言説を書き換えるよう促す。以上より,現在のアライメントトレーニングは,定性的分析によってのみ,ペルソナ特異的な生成的分解を引き起こすことが示唆された。
参考スコア（独自算出の注目度）: 4.032192350354742
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Safety alignment reduces explicitly harmful outputs but inadvertently encodes a sanitized, neuronormative representation of marginalized communication. We investigate this encoding using a dual-persona rewrite paradigm, prompting ten large language models (LLMs) to rewrite naturally occurring autistic discourse from either an autistic or neurotypical persona. We uncover autistic-persona rewrites diverge significantly more in lexical form and affective register than neurotypical rewrites, despite equivalent semantic similarity. Furthermore, most models collapse cross-persona generations into near-identical outputs. To uncover the mechanisms behind this generative breakdown, we introduce a multi-agent qualitative analysis framework. Our results reveal systemic output erasure, stereotyped hallucination, and task-evasive meta-commentary are pervasive failure modes for this task that cluster by alignment strategy rather than parameter scale. Finally, our targeted comparison with autistic human annotators demonstrates that community-insider knowledge produces systematic label reversals relative to LLM classifications. Our findings indicate that current alignment training causes persona-specific generative breakdown visible only through qualitative analysis, confirming a deep representational gap that prompt engineering cannot resolve.
Abstract（参考訳）: 安全アライメントは明らかに有害な出力を減少させるが、衛生的で神経ノルミティブなコミュニケーションの表現を不注意に符号化する。両対人書き直しのパラダイムを用いて、この符号化を検証し、10大言語モデル(LLM)に対して、自閉症または神経型ペルソナから自然に発生する自閉症の言説を書き換えるよう促す。意味的類似性に拘わらず, 自己愛的人格的書き直しは, 神経型書き直しよりも, 語彙形式や情緒的書き直しにおいて著しく多様であることがわかった。さらに、ほとんどのモデルは対人世代をほぼ同一の出力に分解する。この生成的破壊の背後にあるメカニズムを明らかにするために,マルチエージェント定性分析フレームワークを導入する。この結果から, パラメータスケールではなくアライメント戦略によってクラスタリングするタスクに対して, 出力消去, ステレオタイプ幻覚, タスク回避メタコンプレクタは, 広範囲にわたる障害モードであることが明らかとなった。最後に, 自閉症者のアノテータとの比較により, LLM分類と比較して, コミュニティ・インスパイアの知識が系統的なラベル逆転を生じさせることを示した。現状のアライメントトレーニングでは,定性的分析によってのみ人格特異的な生成的分解が見られ,工学的に解決できない深い表現的ギャップが確認できた。

論文の概要: Algorithmic Fragility and Persona Bias in LLM-Generated Autistic Communication

関連論文リスト