Fugu-MT 論文翻訳(概要): MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese

論文の概要: MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese

arxiv url: http://arxiv.org/abs/2604.25926v1
Date: Wed, 01 Apr 2026 12:12:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 02:32:14.225058
Title: MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese
Title（参考訳）: MATH-PT:ヨーロッパとブラジルのポルトガル語の数学推論ベンチマーク
Authors: Tiago Teixeira, Ana Carolina Erthal, Juan Belieni, Beatriz Canaverde, Diego Mesquita, Miguel Faria, Eliezer de Souza da Silva, André F. T. Martins,
Abstract要約: sc Math-PTは、ヨーロッパとブラジルのポルトガル語で書かれた1,729の数学的問題からなる新しいデータセットである。 sc Math-PTは、数学のオリンピアード、競技会、ポルトガルとブラジルの試験など、さまざまな高品質なネイティブソースからキュレーションされている。我々は,Sc Math-PT上で現在最先端のLCMの総合ベンチマークを行い,フロンティア推論モデルが複数の選択問題において高い性能を達成することを示した。
参考スコア（独自算出の注目度）: 22.329498961271195
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The use of large language models (LLMs) for complex mathematical reasoning is an emergent area of research, with fast progress in methods, models, and benchmark datasets. However, most mathematical reasoning evaluations exhibit a significant linguistic bias, with the vast majority of benchmark datasets being exclusively in English or (at best) translated from English. We address this limitation by introducing {\sc Math-PT}, a novel dataset comprising 1,729 mathematical problems written in European and Brazilian Portuguese. {\sc Math-PT} is curated from a variety of high-quality native sources, including mathematical Olympiads, competitions, and exams from Portugal and Brazil. We present a comprehensive benchmark of current state-of-the-art LLMs on {\sc Math-PT}, revealing that frontier reasoning models achieve strong performance in multiple choice questions compared to open weight models, but that their performance decreases for questions with figures or open-ended questions. To facilitate future research, we release the benchmark dataset and model outputs.
Abstract（参考訳）: 複雑な数学的推論に大規模言語モデル(LLM)を使用することは、メソッド、モデル、ベンチマークデータセットの急速な進歩とともに、研究の創発的な領域である。しかし、ほとんどの数学的推論評価は言語学的偏見を示しており、ほとんどのベンチマークデータセットは英語でのみ、あるいは英語でのみ翻訳されている。この制限に対処するために、ヨーロッパとブラジルのポルトガル語で書かれた1,729の数学的問題からなる新しいデータセットである {\sc Math-PT} を導入する。 {\sc Math-PT} は、数学のオリンピアード、競技会、ポルトガルとブラジルの試験など、様々な高品質なネイティブソースからキュレーションされている。そこで本研究では,フロンティア推論モデルが複数の選択問題において,開放重みモデルと比較して高い性能を達成できることを示すとともに,その性能は,数値やオープンエンド質問による質問に対して低下することを示す。今後の研究を容易にするため、ベンチマークデータセットとモデル出力をリリースする。

論文の概要: MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese

関連論文リスト