Fugu-MT 論文翻訳(概要): Auditing Preferences for Brands and Cultures in LLMs

論文の概要: Auditing Preferences for Brands and Cultures in LLMs

arxiv url: http://arxiv.org/abs/2603.18300v1
Date: Wed, 18 Mar 2026 21:38:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-20 17:19:05.858749
Title: Auditing Preferences for Brands and Cultures in LLMs
Title（参考訳）: LLMにおけるブランドと文化に対する評価
Authors: Jasmine Rienecker, Katarina Mpofu, Naman Goel, Siddhartha Datta, Jun Zhao, Oscar Danielsson, Fredrik Thorsen,
Abstract要約: 本稿では,大規模言語モデル(LLM)におけるブランドや文化の嗜好を監査するための再現可能なフレームワークであるChoiceEvalを紹介する。 ChoiceEvalは、現実的でペルソナの異なる評価クエリを生成し、自由形式のアウトプットを同等の選択セットと定量的選好メトリクスに変換する、という2つの技術的な課題に対処する。 Gemini、GPT、DeepSeekは、商業と文化にまたがる10のトピックと2,000以上の質問に適用される。
参考スコア（独自算出の注目度）: 9.677509409150549
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) based AI systems increasingly mediate what billions of people see, choose and buy. This creates an urgent need to quantify the systemic risks of LLM-driven market intermediation, including its implications for market fairness, competition, and the diversity of information exposure. This paper introduces ChoiceEval, a reproducible framework for auditing preferences for brands and cultures in large language models (LLMs) under realistic usage conditions. ChoiceEval addresses two core technical challenges: (i) generating realistic, persona-diverse evaluation queries and (ii) converting free-form outputs into comparable choice sets and quantitative preference metrics. For a given topic (e.g. running shoes, hotel chains, travel destinations), the framework segments users into psychographic profiles (e.g., budget-conscious, wellness-focused, convenience), and then derives diverse prompts that reflect real-world advice-seeking and decision-making behaviour. LLM responses are converted into normalised top-k choice sets. Preference and geographic bias are then quantified using comparable metrics across topics and personas. Thus, ChoiceEval provides a scalable audit pipeline for researchers, platforms, and regulators, linking model behaviour to real-world economic outcomes. Applied to Gemini, GPT, and DeepSeek across 10 topics spanning commerce and culture and more than 2,000 questions, ChoiceEval reveals consistent preferences: U.S.-developed models Gemini and GPT show marked favouritism toward American entities, while China-developed DeepSeek exhibits more balanced yet still detectable geographic preferences. These patterns persist across user personas, suggesting systematic rather than incidental effects.
Abstract（参考訳）: 大規模言語モデル(LLM)ベースのAIシステムは、何十億もの人々が見ているか、選択し、購入するかをますます仲介する。これにより、市場公正性、競争、情報露出の多様性など、LCMが主導する市場介入のシステム的リスクを定量化する緊急の必要が生じる。本稿では,大規模言語モデル(LLM)におけるブランドや文化の嗜好を現実的な使用条件下で監査するための再現可能なフレームワークであるChoiceEvalを紹介する。 ChoiceEvalは2つの技術的な課題に対処する。 (i)現実的、ペルソナ・ディバース評価クエリを生成して (ii)自由形式の出力を同等の選択セットと定量的選好メトリクスに変換する。特定のトピック(例えば、ランニングシューズ、ホテルチェーン、旅行先)について、このフレームワークはユーザーをサイコグラフィープロファイル(例えば、予算重視、ウェルネス重視、利便性)に分割し、現実世界のアドバイスと意思決定の振る舞いを反映したさまざまなプロンプトを導出する。 LLM応答は正規化されたトップk選択集合に変換される。優先順位と地理的バイアスはトピックやペルソナにまたがる同等のメトリクスを使って定量化されます。このように、ChoiceEvalは、研究者、プラットフォーム、規制当局のためのスケーラブルな監査パイプラインを提供し、モデル行動と実際の経済成果をリンクする。 Gemini、GPT、DeepSeekなど、商業と文化にまたがる10のトピックと2000以上の質問に適用されるChoiceEvalは、一貫した嗜好を明らかにしている。これらのパターンはユーザペルソナ全体に持続し、インシデント効果よりもシステマティックな効果を示唆する。

論文の概要: Auditing Preferences for Brands and Cultures in LLMs

関連論文リスト