Fugu-MT 論文翻訳(概要): Polyglots or Multitudes? Multilingual LLM Answers to Value-laden Multiple-Choice Questions

論文の概要: Polyglots or Multitudes? Multilingual LLM Answers to Value-laden Multiple-Choice Questions

arxiv url: http://arxiv.org/abs/2602.05932v1
Date: Thu, 05 Feb 2026 17:44:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-06 18:49:09.094069
Title: Polyglots or Multitudes? Multilingual LLM Answers to Value-laden Multiple-Choice Questions
Title（参考訳）: 多言語か多言語か : 多言語LLMによる多言語質問への回答
Authors: Léo Labat, Etienne Ollion, François Yvon,
Abstract要約: MCQ(Multiple-Choice Questions)は、知識、推論能力、さらには大きな言語モデル(LLM)で符号化された値を評価するためにしばしば用いられる。
参考スコア（独自算出の注目度）: 16.64653069179642
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Multiple-Choice Questions (MCQs) are often used to assess knowledge, reasoning abilities, and even values encoded in large language models (LLMs). While the effect of multilingualism has been studied on LLM factual recall, this paper seeks to investigate the less explored question of language-induced variation in value-laden MCQ responses. Are multilingual LLMs consistent in their responses across languages, i.e. behave like theoretical polyglots, or do they answer value-laden MCQs depending on the language of the question, like a multitude of monolingual models expressing different values through a single model? We release a new corpus, the Multilingual European Value Survey (MEVS), which, unlike prior work relying on machine translation or ad hoc prompts, solely comprises human-translated survey questions aligned in 8 European languages. We administer a subset of those questions to over thirty multilingual LLMs of various sizes, manufacturers and alignment-fine-tuning status under comprehensive, controlled prompt variations including answer order, symbol type, and tail character. Our results show that while larger, instruction-tuned models display higher overall consistency, the robustness of their responses varies greatly across questions, with certain MCQs eliciting total agreement within and across models while others leave LLM answers split. Language-specific behavior seems to arise in all consistent, instruction-fine-tuned models, but only on certain questions, warranting a further study of the selective effect of preference fine-tuning.
Abstract（参考訳）: MCQ(Multiple-Choice Questions)は、知識、推論能力、さらには大きな言語モデル(LLM)で符号化された値を評価するためにしばしば用いられる。マルチリンガリズムがLLMの事実的リコールに与える影響について検討されているが,本研究では,多言語によるMCQ応答の変動について,未検討の課題について考察する。多言語 LLM は言語間の応答に一貫性があるのか、つまり、理論的な多言語のように振る舞うのか、それとも、単一のモデルを通して異なる値を表現する複数の単言語モデルのように、質問の言語によって、価値に富んだMCQに答えるのか? 我々は、機械翻訳やアドホックなプロンプトに依存する以前の作業とは異なり、8つのヨーロッパ言語で整列された人間翻訳された調査質問のみを含む新しいコーパス、MEVS(Multilingual European Value Survey)をリリースした。我々は,これらの質問のサブセットを,回答順序,記号型,尾文字など,包括的かつ制御されたプロンプト変化の下で,様々な大きさの多言語LLM,製造者,アライメントファインニング状態の30以上に管理する。以上の結果から,より大きな命令調整モデルでは全体の一貫性が向上するが,応答の頑健性は質問によって大きく異なっており,MCQがモデル内およびモデル間の総一致を導き出す一方で,LCMの回答を分割するものもある。言語固有の振る舞いは、すべての一貫した命令を微調整したモデルに現れるように見えるが、特定の質問のみに限られ、好みの微調整による選択的な効果についてさらなる研究が保証される。

論文の概要: Polyglots or Multitudes? Multilingual LLM Answers to Value-laden Multiple-Choice Questions

関連論文リスト