Fugu-MT 論文翻訳(概要): SommBench: Assessing Sommelier Expertise of Language Models

論文の概要: SommBench: Assessing Sommelier Expertise of Language Models

arxiv url: http://arxiv.org/abs/2603.12117v1
Date: Thu, 12 Mar 2026 16:19:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:26.209266
Title: SommBench: Assessing Sommelier Expertise of Language Models
Title（参考訳）: SommBench: 言語モデルのSommelier習熟度を評価する
Authors: William Brach, Tomas Bedej, Jacob Nielsen, Jacob Pichna, Juraj Bedej, Eemeli Saarensilta, Julie Dupouy, Gianluca Barmina, Andrea Blasi Núñez, Peter Schneider-Kamp, Kristian Košťál, Michal Ries, Lukas Galke Poech,
Abstract要約: SommBenchは、ソムリエの専門知識を評価するベンチマークである。 Wine Theory Question Answering (WTQA)、Wine Feature Completion (WFC)、Food-Wine Pairing (FWP)である。 SommBenchは英語、スロバキア語、スウェーデン語、フィンランド語、ドイツ語、デンマーク語、イタリア語、スペイン語の複数の言語で利用できる。
参考スコア（独自算出の注目度）: 2.914512709972252
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the rapid advances of large language models, it becomes increasingly important to systematically evaluate their multilingual and multicultural capabilities. Previous cultural evaluation benchmarks focus mainly on basic cultural knowledge that can be encoded in linguistic form. Here, we propose SommBench, a multilingual benchmark to assess sommelier expertise, a domain deeply grounded in the senses of smell and taste. While language models learn about sensory properties exclusively through textual descriptions, SommBench tests whether this textual grounding is sufficient to emulate expert-level sensory judgment. SommBench comprises three main tasks: Wine Theory Question Answering (WTQA), Wine Feature Completion (WFC), and Food-Wine Pairing (FWP). SommBench is available in multiple languages: English, Slovak, Swedish, Finnish, German, Danish, Italian, and Spanish. This helps separate a language model's wine expertise from its language skills. The benchmark datasets were developed in close collaboration with a professional sommelier and native speakers of the respective languages, resulting in 1,024 wine theory question-answering questions, 1,000 wine feature-completion examples, and 1,000 food-wine pairing examples. We provide results for the most popular language models, including closed-weights models such as Gemini 2.5, and open-weights models, such as GPT-OSS and Qwen 3. Our results show that the most capable models perform well on wine theory question answering (up to 97% correct with a closed-weights model), yet feature completion (peaking at 65%) and food-wine pairing show (MCC ranging between 0 and 0.39) turn out to be more challenging. These results position SommBench as an interesting and challenging benchmark for evaluating the sommelier expertise of language models. The benchmark is publicly available at https://github.com/sommify/sommbench.
Abstract（参考訳）: 大規模言語モデルの急速な進歩により、多言語と多文化の能力を体系的に評価することがますます重要になる。従来の文化的評価ベンチマークは主に言語形式でエンコードできる基本的な文化的知識に焦点を当てている。本稿では、匂いや味の感覚に深く根ざしたソムベンチという、ソムリエの専門知識を評価するための多言語ベンチマークを提案する。言語モデルは、テキスト記述によってのみ感覚特性を学習するが、SomBench氏は、このテキスト基底が専門家レベルの感覚判断をエミュレートするのに十分かどうかをテストする。 SommBenchは、Wine Theory Question Answering (WTQA)、Wine Feature Completion (WFC)、Food-Wine Pairing (FWP)の3つの主要なタスクで構成されている。 SommBenchは英語、スロバキア語、スウェーデン語、フィンランド語、ドイツ語、デンマーク語、イタリア語、スペイン語の複数の言語で利用できる。これは、言語モデルのワインの専門知識と言語スキルを区別するのに役立ちます。ベンチマークデータセットは、プロのソムリエや各言語の母語話者と密接なコラボレーションで開発され、1,024のワイン理論に関する質問、1,000のワイン特徴補完例、1,000の食品とワインのペアリング例が得られた。我々は、Gemini 2.5のようなクローズドウェイトモデルや、GPT-OSSやQwen 3のようなオープンウェイトモデルを含む、最も人気のある言語モデルに対する結果を提供する。以上の結果から,ワイン理論の質問応答(クローズドウェイトモデルでは最大97%の正解率),特徴完備(65%),食品とワインのペアリングショー(MCCは0～0.39の範囲)がより困難であることが示唆された。これらの結果は、ソムベンチを言語モデルのソムリエの専門知識を評価するための興味深く挑戦的なベンチマークとして位置づけている。ベンチマークはhttps://github.com/sommify/sommbench.comで公開されている。

論文の概要: SommBench: Assessing Sommelier Expertise of Language Models

関連論文リスト