Fugu-MT 論文翻訳(概要): Information Asymmetry across Language Varieties: A Case Study on Cantonese-Mandarin and Bavarian-German QA

論文の概要: Information Asymmetry across Language Varieties: A Case Study on Cantonese-Mandarin and Bavarian-German QA

arxiv url: http://arxiv.org/abs/2603.14782v1
Date: Mon, 16 Mar 2026 03:28:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:36.030791
Title: Information Asymmetry across Language Varieties: A Case Study on Cantonese-Mandarin and Bavarian-German QA
Title（参考訳）: 言語品種間の情報非対称性:カントーン・マンダリンとバイエルン・ドイツのQAを事例として
Authors: Renhao Pei, Siyao Peng, Verena Blaschke, Robert Litschko, Barbara Plank,
Abstract要約: 大規模言語モデル(LLM)は、人間が知識を求める共通の方法になりつつあるが、その範囲と信頼性は様々である。ローカルなウィキペディアページで伝達される知識をキャプチャする,新たな課題探索データセットを手作業で構築する。実験の結果,LLMはウィキペディアのローカル版でのみ,情報に関する質問に答えることができないことがわかった。
参考スコア（独自算出の注目度）: 37.126690247869426
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are becoming a common way for humans to seek knowledge, yet their coverage and reliability vary widely. Especially for local language varieties, there are large asymmetries, e.g., information in local Wikipedia that is absent from the standard variant. However, little is known about how well LLMs perform under such information asymmetry, especially on closely related languages. We manually construct a novel challenge question-answering (QA) dataset that captures knowledge conveyed on a local Wikipedia page, which is absent from their higher-resource counterparts-covering Mandarin Chinese vs. Cantonese and German vs. Bavarian. Our experiments show that LLMs fail to answer questions about information only in local editions of Wikipedia. Providing context from lead sections substantially improves performance, with further gains possible via translation. Our topical, geographic annotations, and stratified evaluations reveal the usefulness of local Wikipedia editions as sources of both regional and global information. These findings raise critical questions about inclusivity and cultural coverage of LLMs.
Abstract（参考訳）: 大規模言語モデル(LLM)は、人間が知識を求める共通の方法になりつつあるが、その範囲と信頼性は様々である。特にローカル言語の変種には、標準変種にはないローカルウィキペディアの情報を例に挙げて、大きな対称性が存在する。しかしながら、LLMがそのような情報非対称性の下で、特に密接に関連する言語でどれだけうまく機能するかは、ほとんど分かっていない。ローカルなウィキペディアページで伝達される知識をキャプチャする新しい挑戦質問回答データセットを手作業で構築する。実験の結果,LLMはウィキペディアのローカル版でのみ,情報に関する質問に答えることができないことがわかった。リードセクションからコンテキストを提供することでパフォーマンスが大幅に向上し、翻訳によってさらに向上する。地域情報とグローバル情報の両方の情報源として,地域版ウィキペディアが有用であることを明らかにする。これらの知見はLCMの傾きと文化的包括性に関する批判的な疑問を提起する。

論文の概要: Information Asymmetry across Language Varieties: A Case Study on Cantonese-Mandarin and Bavarian-German QA

関連論文リスト