Fugu-MT 論文翻訳(概要): CultureSynth: A Hierarchical Taxonomy-Guided and Retrieval-Augmented Framework for Cultural Question-Answer Synthesis

論文の概要: CultureSynth: A Hierarchical Taxonomy-Guided and Retrieval-Augmented Framework for Cultural Question-Answer Synthesis

arxiv url: http://arxiv.org/abs/2509.10886v1
Date: Sat, 13 Sep 2025 16:33:56 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-16 17:26:22.831793
Title: CultureSynth: A Hierarchical Taxonomy-Guided and Retrieval-Augmented Framework for Cultural Question-Answer Synthesis
Title（参考訳）: CultureSynth: 文化的質問・回答のための階層型分類指導・検索型フレームワーク
Authors: Xinyu Zhang, Pei Zhang, Shuang Luo, Jialong Tang, Yu Wan, Baosong Yang, Fei Huang,
Abstract要約: 本稿では,大規模言語モデルの文化的能力を評価する新しいフレームワークであるCulture Synthを紹介する。 Culture Synth-7ベンチマークには、7つの言語にまたがる19,360項目と4,149項目の検証済みエントリが含まれている。
参考スコア（独自算出の注目度）: 41.483432890962824
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cultural competence, defined as the ability to understand and adapt to multicultural contexts, is increasingly vital for large language models (LLMs) in global environments. While several cultural benchmarks exist to assess LLMs' cultural competence, current evaluations suffer from fragmented taxonomies, domain specificity, and heavy reliance on manual data annotation. To address these limitations, we introduce CultureSynth, a novel framework comprising (1) a comprehensive hierarchical multilingual cultural taxonomy covering 12 primary and 130 secondary topics, and (2) a Retrieval-Augmented Generation (RAG)-based methodology leveraging factual knowledge to synthesize culturally relevant question-answer pairs. The CultureSynth-7 synthetic benchmark contains 19,360 entries and 4,149 manually verified entries across 7 languages. Evaluation of 14 prevalent LLMs of different sizes reveals clear performance stratification led by ChatGPT-4o-Latest and Qwen2.5-72B-Instruct. The results demonstrate that a 3B-parameter threshold is necessary for achieving basic cultural competence, models display varying architectural biases in knowledge processing, and significant geographic disparities exist across models. We believe that CultureSynth offers a scalable framework for developing culturally aware AI systems while reducing reliance on manual annotation\footnote{Benchmark is available at https://github.com/Eyr3/CultureSynth.}.
Abstract（参考訳）: 多文化的な文脈を理解・適応する能力として定義された文化能力は、グローバル環境における大規模言語モデル(LLM)にとってますます不可欠である。 LLMの文化的能力を評価するためにいくつかの文化ベンチマークが存在するが、現在の評価は断片化された分類学、ドメインの特異性、手動データアノテーションに大きく依存している。このような制約に対処するため,(1)一次・130の二次トピックを包括的に網羅した階層的多言語文化的分類法であるCultureSynthを導入し,(2)事実知識を活用して文化的に関係のある質問・回答ペアを合成するRAG(Retrieval-Augmented Generation)に基づく手法を提案する。 CultureSynth-7 合成ベンチマークには、7つの言語にわたる19360のエントリと4,149の手作業による検証済みエントリが含まれている。異なる大きさの14個のLLMの評価により,ChatGPT-4o-extとQwen2.5-72B-Instructが導いた明らかな性能成層化が示された。その結果,基本的文化的能力を達成するためには3Bパラメータのしきい値が必要であること,知識処理におけるアーキテクチャ的バイアスが変化すること,モデル間での地理的格差が顕著であること,などが示唆された。 CultureSynthは、文化的に認識されたAIシステムを開発するためのスケーラブルなフレームワークを提供すると同時に、手作業によるアノテーションへの依存を低減します。と。

論文の概要: CultureSynth: A Hierarchical Taxonomy-Guided and Retrieval-Augmented Framework for Cultural Question-Answer Synthesis

関連論文リスト