Fugu-MT 論文翻訳(概要): LLM Agents Predict Social Media Reactions but Do Not Outperform Text Classifiers: Benchmarking Simulation Accuracy Using 120K+ Personas of 1511 Humans

論文の概要: LLM Agents Predict Social Media Reactions but Do Not Outperform Text Classifiers: Benchmarking Simulation Accuracy Using 120K+ Personas of 1511 Humans

arxiv url: http://arxiv.org/abs/2604.19787v1
Date: Tue, 31 Mar 2026 19:27:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 02:32:14.079238
Title: LLM Agents Predict Social Media Reactions but Do Not Outperform Text Classifiers: Benchmarking Simulation Accuracy Using 120K+ Personas of 1511 Humans
Title（参考訳）: LLMエージェントはソーシャルメディアの反応を予測するが、テキスト分類器に勝らない:1511人の120K以上の人物を用いたシミュレーション精度のベンチマーク
Authors: Ljubisa Bojic, Alexander Felfernig, Bojana Dinic, Velibor Ilic, Achim Rettinger, Vera Mevorah, Damian Trilling,
Abstract要約: この研究では、12万以上のエージェントとペルソナの組み合わせに対して、人のソーシャルメディア反応(嫌い、コメント、共有、反応なし)を予測する精度をベンチマークします。エージェントは70.7%の精度を達成し、LSMの選択は13ポイントのパフォーマンスを拡大した。ゼロショットのペルソナプロンプトエージェントの真の予測正当性は、行動的に異なるAIエージェントの群れをソーシャルメディアに簡単に展開することで潜在的な操作を警告する。
参考スコア（独自算出の注目度）: 36.156330303795016
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Social media platforms mediate how billions form opinions and engage with public discourse. As autonomous AI agents increasingly participate in these spaces, understanding their behavioral fidelity becomes critical for platform governance and democratic resilience. Previous work demonstrates that LLM-powered agents can replicate aggregate survey responses, yet few studies test whether agents can predict specific individuals' reactions to specific content. This study benchmarks LLM-based agents' accuracy in predicting human social media reactions (like, dislike, comment, share, no reaction) across 120,000+ unique agent-persona combinations derived from 1,511 Serbian participants and 27 large language models. In Study 1, agents achieved 70.7% overall accuracy, with LLM choice producing a 13 percentage-point performance spread. Study 2 employed binary forced-choice (like/dislike) evaluation with chance-corrected metrics. Agents achieved Matthews Correlation Coefficient (MCC) of 0.29, indicating genuine predictive signal beyond chance. However, conventional text-based supervised classifiers using TF-IDF representations outperformed LLM agents (MCC of 0.36), suggesting predictive gains reflect semantic access rather than uniquely agentic reasoning. The genuine predictive validity of zero-shot persona-prompted agents warns against potential manipulation through easily deploying swarms of behaviorally distinct AI agents on social media, while simultaneously offering opportunities to use such agents in simulations for predicting polarization dynamics and informing AI policy. The advantage of using zero-shot agents is that they require no task-specific training, making their large-scale deployment easy across diverse contexts. Limitations include single-country sampling. Future research should explore multilingual testing and fine-tuning approaches.
Abstract（参考訳）: ソーシャルメディアプラットフォームは、何十億もの人々が意見を形成し、公の場での議論にかかわるかを仲介する。自律型AIエージェントがこれらの領域にますます参加するにつれて、プラットフォームガバナンスと民主的レジリエンスにとって、その行動の忠実さを理解することが重要になる。これまでの研究では、LCMを動力とするエージェントが調査回答を再現できることが示されているが、特定のコンテンツに対する特定の個人の反応を予測できるかどうかをテストする研究はほとんどない。本研究は,1,511名の参加者と27名の大規模言語モデルから抽出された12000名以上のエージェントとペルソナの組み合わせに対して,ヒトのソーシャルメディア反応(好き嫌い,コメント,共有,反応なし)を予測するためのLSMベースのエージェントの精度をベンチマークした。研究1では、エージェントは全体の70.7%の精度を達成し、LSMの選択は13ポイントのパフォーマンスを拡大した。調査2では,2値の強制選択 (like/dislike) 評価をチャンス補正メトリクスを用いて実施した。エージェントはマシューズ相関係数(MCC)を0.29で達成し、真に予測的な信号が偶然を超えたことを示している。しかし、TF-IDF表現を用いた従来のテキストベースの教師付き分類器はLLMエージェント(MCC:0.36)よりも優れており、予測ゲインは独特なエージェント推論よりもセマンティックアクセスを反映している。ゼロショットのペルソナプロンプトエージェントの真の予測正当性は、ソーシャルメディア上に行動的に異なるAIエージェントの群れを簡単に配置し、同時に、偏光ダイナミクスの予測とAIポリシーのインフォームのためのシミュレーションにそのようなエージェントを使用する機会を提供することによって、潜在的な操作に対して警告する。ゼロショットエージェントを使用することの利点は、タスク固有のトレーニングを必要としないことだ。制限はシングルカントリーサンプリングを含む。今後の研究は多言語テストと微調整アプローチを検討するべきである。

論文の概要: LLM Agents Predict Social Media Reactions but Do Not Outperform Text Classifiers: Benchmarking Simulation Accuracy Using 120K+ Personas of 1511 Humans

関連論文リスト