Fugu-MT 論文翻訳(概要): Büyük Dil Modelleri için TR-MMLU Benchmarkı: Performans Değerlendirmesi, Zorluklar ve İyileştirme Fırsatları

論文の概要: Büyük Dil Modelleri için TR-MMLU Benchmarkı: Performans Değerlendirmesi, Zorluklar ve İyileştirme Fırsatları

arxiv url: http://arxiv.org/abs/2508.13044v1
Date: Mon, 18 Aug 2025 16:00:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-19 14:49:11.47082
Title: Büyük Dil Modelleri için TR-MMLU Benchmarkı: Performans Değerlendirmesi, Zorluklar ve İyileştirme Fırsatları
Title（参考訳）: Büyük Dil Modelleri için TR-MMLU Benchmarkı: Performans De'erlendirmesi, Zorluklar ve syile tirme Fırsatları
Authors: M. Ali Bayram, Ali Arda Fincan, Ahmet Semih Gümüş, Banu Diri, Savaş Yıldırım, Öner Aytaş,
Abstract要約: TR-MMLUは、トルコの大規模言語モデル(LLM)の言語的および概念的能力を評価するためのフレームワークである。トルコの教育システム内の62のセクションにまたがる6,200の多重選択質問からなるデータセットに基づいている。 TR-MMLUはトルコのNLP研究を推進し、将来のイノベーションを刺激する新しい標準を設定している。
参考スコア（独自算出の注目度）: 0.29687381456163997
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Language models have made significant advancements in understanding and generating human language, achieving remarkable success in various applications. However, evaluating these models remains a challenge, particularly for resource-limited languages like Turkish. To address this issue, we introduce the Turkish MMLU (TR-MMLU) benchmark, a comprehensive evaluation framework designed to assess the linguistic and conceptual capabilities of large language models (LLMs) in Turkish. TR-MMLU is based on a meticulously curated dataset comprising 6,200 multiple-choice questions across 62 sections within the Turkish education system. This benchmark provides a standard framework for Turkish NLP research, enabling detailed analyses of LLMs' capabilities in processing Turkish text. In this study, we evaluated state-of-the-art LLMs on TR-MMLU, highlighting areas for improvement in model design. TR-MMLU sets a new standard for advancing Turkish NLP research and inspiring future innovations.
Abstract（参考訳）: 言語モデルは、人間の言語を理解し、生成し、様々なアプリケーションで顕著な成功を収めた。しかし、トルコ語のような資源に制限のある言語では、これらのモデルを評価することは依然として困難である。そこで本稿では,トルコ語における大規模言語モデル(LLM)の言語的・概念的能力を評価するための総合的な評価フレームワークである,トルコ語MMLU(TR-MMLU)ベンチマークを紹介する。 TR-MMLUは、トルコの教育システム内の62のセクションにわたる6,200の多重選択質問からなる厳密にキュレートされたデータセットに基づいている。このベンチマークはトルコのNLP研究の標準フレームワークを提供し、トルコ語のテキスト処理におけるLLMの能力の詳細な分析を可能にする。本研究では, TR-MMLU 上での最先端 LLM の評価を行い, モデル設計の改善分野を強調した。 TR-MMLUはトルコのNLP研究を推進し、将来のイノベーションを刺激する新しい標準を設定している。

論文の概要: Büyük Dil Modelleri için TR-MMLU Benchmarkı: Performans Değerlendirmesi, Zorluklar ve İyileştirme Fırsatları

関連論文リスト